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Abstract 

It is commonly assumed that a person’s emotional state can be readily inferred from his or her facial movements, 
typically called emotional expressions or facial expressions. This assumption influences legal judgments, policy decisions, 
national security protocols, and educational practices; guides the diagnosis and treatment of psychiatric illness, as well 
as the development of commercial applications; and pervades everyday social interactions as well as research in other 
scientific fields such as artificial intelligence, neuroscience, and computer vision. In this article, we survey examples 
of this widespread assumption, which we refer to as the common view, and we then examine the scientific evidence 
that tests this view, focusing on the six most popular emotion categories used by consumers of emotion research: anger, 
disgust, fear, happiness, sadness, and surprise. The available scientific evidence suggests that people do sometimes 
smile when happy, frown when sad, scowl when angry, and so on, as proposed by the common view, more than what 
would be expected by chance. Yet how people communicate anger, disgust, fear, happiness, sadness, and surprise 
varies substantially across cultures, situations, and even across people within a single situation. Furthermore, similar 
configurations of facial movements variably express instances of more than one emotion category. In fact, a given 
configuration of facial movements, such as a scowl, often communicates something other than an emotional state. 
Scientists agree that facial movements convey a range of information and are important for social communication, 
emotional or otherwise. But our review suggests an urgent need for research that examines how people actually move 
their faces to express emotions and other social information in the variety of contexts that make up everyday life, as 
well as careful study of the mechanisms by which people perceive instances of emotion in one another. We make 
specific research recommendations that will yield a more valid picture of how people move their faces to express 
emotions and how they infer emotional meaning from facial movements in situations of everyday life. This research is 
crucial to provide consumers of emotion research with the translational information they require. 

Keywords 

emotion perception, emotional expression, emotion recognition 


Faces are a ubiquitous part of everyday life for humans. 
People greet each other with smiles or nods. They have 
face-to-face conversations on a daily basis, whether in 
person or via computers. They capture faces with smart¬ 
phones and tablets, exchanging photos of themselves 
and of each other on Instagram, Snapchat, and other 
social-media platforms. The ability to perceive faces is 


one of the first capacities to emerge after birth: An 
infant begins to perceive faces within the first few days 
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of life, equipped with a preference for face-like arrange¬ 
ments that allows the brain to wire itself, with experi¬ 
ence, to become expert at perceiving faces (Arcaro, 
Schade, Vincent, Ponce, & Livingstone, 2017; Cassia, 
Turati, & Simion, 2004; Gandhi, Singh, Swami, Ganesh, 
& Sinhaet, 2017; Grossmann, 2015; L. B. Smith, Jayaraman, 
Clerkin, & Yu, 2018; Turati, 2004; but see Young and 
Burton, 2018, for a more qualified claim). Faces offer 
a rich, salient source of information for navigating the 
social world: They play a role in deciding whom to 
love, whom to trust, whom to help, and who is found 
guilty of a crime (Todorov, 2017; Zebrowitz, 1997, 2017; 
Zhang, Chen, & Yang, 2018). Beginning with the ancient 
Greeks (Aristotle, in the 4th century BCE) and Romans 
(Cicero), various cultures have viewed the human face 
as a window on the mind. But to what extent can a 
raised eyebrow, a curled lip, or a narrowed eye reveal 
what someone is thinking or feeling, allowing a per- 
ceiver’s brain to guess what that someone will do next?^ 
The answers to these questions have major conse¬ 
quences for human outcomes as they unfold in the 
living room, the classroom, the courtroom, and even 
on the battlefield. They also powerfully shape the direc¬ 
tion of research in a broad array of scientific fields, 
from basic neuroscience to psychiatry. 

Understanding what facial movements might reveal 
about a person’s emotions is made more urgent by the 
fact that many people believe they already know. Spe¬ 
cific configurations of facial-muscle movements^ 
appear as if they summarily broadcast or display a 
person’s emotions, which is why they are routinely 
referred to as emotional expressions and facial 
expressions. A simple Google search for the phrase 
“emotional facial expressions” (see Box 1 in the Supple¬ 
mental Material available online) reveals the ubiquity 
with which, at least in certain parts of the world, people 
believe that certain emotion categories are reliably sig¬ 
naled or revealed by certain facial-muscle movement 
configurations—a set of beliefs we refer to as the common 
view (also called the classical view-, L. F. Barrett, 2017b). 
Likewise, many cultural products testify to the common 
view. Here are several examples: 

• Technology companies are investing tremendous 
resources to figure out how to objectively “read” 
emotions in people by detecting their presumed 
facial expressions, such as scowling faces, frown¬ 
ing faces, and smiling faces, in an automated fash¬ 
ion. Several companies claim to have already 
done it (e.g., Affectiva.com, 2018; Microsoft Azure, 
2018). For example, Microsoft’s Emotion API 
promises to take video images of a person’s face 
to detect what that individual is feeling. Micro¬ 
soft’s website states that its software “integrates 


emotion recognition, returning the confidence 
across a set of emotions . . . such as anger, con¬ 
tempt, disgust, fear, happiness, neutral, sadness, 
and surprise. These emotions are understood to 
be cross-culturally and universally communicated 
with particular facial expressions” (screen 3). 

• Countless electronic messages are annotated with 
emojis or emoticons that are schematized ver¬ 
sions of the proposed facial expressions for vari¬ 
ous emotion categories (Emojipedia.org, 2019). 

• Putative emotional expressions are taught to pre¬ 
school children by displaying scowling faces, 
frowning faces, smiling faces, and so on, in post¬ 
ers (e.g., use “feeling chart for children” in a 
Google image search), games (e.g.. Miniland emo¬ 
tion games; Miniland Group, 2019), books (e.g., 
Cain, 2000; T. Parr, 2005), and episodes of Sesame 
Street (among many examples, see Morenoff, 
2014; Pliskin, 2015; Valentine & Lehmann, 2015).^ 

• Television shows (e.g.. Lie to Me-, Baum & Grazer, 
2009), movies (e.g.. Inside Out-, Docter, Del Carmen, 
LeFauve, Cooley, and Lassetter, 2015), and docu¬ 
mentaries (e.g.. The Human Face, produced by the 
British Broadcasting Company; Cleese, Erskine, & 
Stewart, 2001) customarily depict certain facial 
configurations as universal expressions of 
emotions. 

• Magazine and newspaper articles routinely fea¬ 
ture stories in kind: Facial configurations depict¬ 
ing a scowl are referred to as “expressions of 
anger,” facial configurations depicting a smile are 
referred to as “expressions of happiness,” facial 
configurations depicting a frown are referred to 
as “expressions of sadness,” and so on. 

• Agents of the U.S. Federal Bureau of Investigation 
(FBI) and the Transportation Security Administra¬ 
tion (TSA) were trained to detect emotions and 
other intentions using these facial configurations, 
with the goal of identifying and thwarting terror¬ 
ists (R. Heilig, special agent with the FBI, personal 
communication, December 15, 2014; L. F. Barrett, 
2017c).^ 

• The facial configurations that supposedly diagnose 
emotional states also figure prominently in the 
diagnosis and treatment of psychiatric disorders. 
One of the most widely used tasks in autism 
research, the Reading the Mind in the Eyes Test, 
asks test takers to match photos of the upper (eye) 
region of a posed facial configuration with specific 
mental state words, including emotion words 
(Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, 
2001). Treatment plans for people living with 
autism and other brain disorders often include 
learning to recognize these facial configurations 
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as emotional expressions (Baron-Cohen, Golan, 
Wheelwright, & Hill, 2004; Kouo & Egel, 2016). 
This training does not generalize well to real- 
world skills, however (Berggren et ah, 2018; Kouo 
& Egel, 2016). 

• “Reading” the emotions of a defendant—in the 
words of Supreme Court Justice Anthony Kennedy, 
to “know the heart and mind of the offender” 
{Riggins V. Nevada, 1992, p. 142)—is one pillar of 
a fair trial in the U.S. legal system and in many 
legal systems in the Western world. Legal actors 
such as jurors and judges routinely rely on facial 
movements to determine the guilt and remorse 
of a defendant (e.g., Bandes, 2014; Zebrowitz, 
1997). For example, defendants who are per¬ 
ceived as untrustworthy receive harsher sen¬ 
tences than they otherwise would (J. R Wilson & 
Rule, 2015, 2016), and such perceptions are more 
likely when a person appears to be angry (i.e., 
the person’s facial structure looks similar to the 
hypothesized facial expression of anger, which is 
a scowl; Todorov, 2017). An incorrect inference 
about defendants’ emotional state can cost them 
their children, their freedom, or even their lives 
(for recent examples, see L. F. Barrett, 2017b, 
beginning on page 183). 

But can a person’s emotional state be reasonably 
inferred from that person’s facial movements? In this 
article, we offer a systematic review of the evidence, 
testing the common view that instances of an emotion 
category are signaled with a distinctive configuration 
of facial movements that has enough reliability and 
specificity to serve as a diagnostic marker of those 
instances. We focus our review on evidence pertaining 
to six emotion categories that have received the lion’s 
share of attention in scientific research—anger, disgust, 
fear, happiness, sadness, and surprise—and that, cor¬ 
respondingly, are the focus of the common view (as 
evidenced by our Google search, summarized in Box 
1 in the Supplemental Material). Our conclusions apply, 
however, to all emotion categories that have thus far 
been scientifically studied. We open the article with a 
brief discussion of its scope, approach, and intended 
audience. We then summarize evidence on how people 
actually move their faces during episodes of emotion, 
referred to as studies of expression production, fol¬ 
lowing which we examine evidence on which emotions 
are actually inferred from looking at facial movements, 
referred to as studies of emotion perception. We iden¬ 
tify three key shortcomings in the scientific research 
that have contributed to a general misunderstanding 
about how emotions are expressed and perceived in 
facial movements and that limit the translation of this 
scientific evidence for other uses: 


1. Limited reliability (i.e., instances of the same 
emotion category are neither reliably expressed 
through nor perceived from a common set of 
facial movements). 

2. Lack of specificity (i.e., there is no unique map¬ 
ping between a configuration of facial move¬ 
ments and instances of an emotion category). 

3. Limitedgeneralizability (i.e., the effects of con¬ 
text and culture have not been sufficiently docu¬ 
mented and accounted for). 

We then discuss our conclusions, followed by proposals 
for consumers on how they might use the existing sci¬ 
entific literature. We also provide recommendations for 
future research on emotion production and perception 
with consumers of that research in mind. We have 
included additional detail on some topics of import or 
interest in the Supplemental Material. 

Scope, Approach, and Intended 
Audience of Article 

The common view: reading an 
emotional state from a set of facial 
movements 

In common English parlance, people refer to “an emo¬ 
tion” as if anger, happiness, or any emotion word 
referred to an event that is highly similar on most occur¬ 
rences. But an emotion word refers to a category of 
instances that vary from one another in their physical 
features (e.g., facial movements and bodily changes) 
and mental features (e.g., pleasantness, arousal, expe¬ 
rience of the surrounding situation as novel or threaten¬ 
ing, awareness of these properties, and so on). Few 
scientists who study emotion, if any, take the view that 
every instance of an emotion category, such as anger, 
is identical to every other instance, sharing a set of 
necessary and sufficient features across situations, peo¬ 
ple, and cultures. For example, Keltner and Cordaro 
(2017) recently wrote that “there is no one-to-one cor¬ 
respondence between a specific set of facial muscle 
actions or vocal cues and any and every experience of 
emotion” (p. 62). Yet there is considerable scientific 
debate about the extent of the within-category varia¬ 
tion, the specific features that vary, the causes of the 
within-category variation, and implications of this varia¬ 
tion for the nature of emotion (see Fig. 1). 

One popular scientific framework, referred to as the 
basic-emotion approach, hypothesizes that instances of 
an emotion category are expressed with facial move¬ 
ments that vary, to some degree, around a typical set of 
movements (referred to as a prototype; for examples, 
see Table 1). For example, it is hypothesized that in one 
situation or for one person, anger might be expressed 
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Fig. 1. Explanatory frameworks guiding the science of emotion: the nature of emotion categories and their concepts. The information in 
the figure is plotted along two dimensions. The horizontal dimension represents hypotheses about the similarities in surface features shared 
by instances of the same emotion category (e.g., the facial movements that express instances of the same emotion category). The vertical 
dimension represents hypotheses about the similarities in the mechanisms that cause instances of the same emotion category (e.g., the neural 
circuits or assemblies that cause instances of the same emotion category). The colors represent the type of emotion categories proposed in 
each theoretical framework. Approaches in the green area describe ad hoc, abstract categories; those in the yellow area describe prototype 
or theory-based categories, and those in the red area describe natural-kind categories. 


with the facial prototype (e.g., brows furrowed, eyes 
wide, lips tightened) plus additional facial movements, 
such as a widened mouth, whereas on other occasions, 
one facial movement from the prototype might be miss¬ 
ing (e.g., anger might be expressed with narrowed eyes 
or without movement in the eyebrow region; for a dis¬ 
cussion, see Box 2 in the Supplemental Material). None¬ 
theless, the basic-emotion approach still assumes that 
there is a core facial configuration—the prototype—that 
can be used to diagnose a person’s emotional state in 
much the same way that a fingerprint can be used to 
uniquely recognize a person. More substantial variation 
in expressions (e.g., smiling in anger, gasping with wid¬ 
ened eyes in anger, and scowling not in anger but in 
confusion or concentration) is typically explained as the 
result of processes that are independent of an emotion 
itself and that modify its prototypic expression, such as 
display rules, emotion-regulation strategies (e.g., sup¬ 
pressing the expression), or culture-specific dialects (as 
proposed by various scientists, including Ekman & Cor- 
daro, 2011; Elfenbein, 2013, 2017; Matsumoto, 1990; 


Matsumoto, Keltner, Shiota, O’Sullivan, & Erank, 2008; 
Tracy & Randles, 2011). 

By contrast, other scientific frameworks propose that 
expressions of the same emotion category, such as 
anger, vary substantially across different people and 
situations. For example, when the goal of being angry 
is to overcome an obstacle, it may be more useful to 
scowl during some instances of anger, smile or laugh, 
or even stoically widen one’s eyes, depending on the 
temporospatial context. This variation is thought to be 
a meaningful part of an emotional expression because 
facial movements are functionally tied to the immediate 
context, which includes a person’s internal context 
(e.g., the person’s metabolic condition, the past experi¬ 
ences that come to mind) and outward context (e.g., 
whether a person is at work, at school, or at home, who 
else is present and the broader cultural conditions), 
both of which vary in dynamic ways over time (see Box 
2 in the Supplemental Material). 

These debates—regarding the source and magnitude 
of variation in the facial movements that express 










Proposed expressive configurations described using The Facial Action Coding System (FACS) 
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Note: Descriptions attributed to Darwin are taken from Matsumoto et al. (2008), Table 13.1. Physical descriptions are taken from Keltner et al. (2019). International core patterns refer to expressions 
of 22 emotion categories that are thought to be conserved across cultures, taken from Cordaro et al. (2018), Tables 4 through 6. A plus sign means that action units would appear simultaneously. A 
comma means that action units are statistically the most probable to appear but do not necessarily happen simultaneously (D. Cordaro, personal communication, November 11, 2018). 
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instances of the same emotion category, as well as the 
magnitude and meaning of the similarity in the facial 
movements that express instances of different emotion 
categories—are useful to scientists. But these debates 
do not provide clear guidance for consumers of emo¬ 
tion research, who are focused on the practical issue of 
whether emotion categories are expressed with facial 
configurations of sufficient regularity and distinctiveness 
so that it is possible to read emotion in a person’s face. 

The common view of emotional expressions persists, 
too, because scientists’ actions often do not follow their 
claims in a transparent, straightforward way. Many sci¬ 
entists continue to design experiments, use stimuli, and 
publish review articles that, ironically, leave readers 
with the impression that certain emotion categories 
have a unique, prototypic facial expression, even as 
those same scientists acknowledge that instances of 
every emotion category can be expressed with a vari¬ 
able set of facial movements. Published studies typically 
test the hypothesis that there are unique emotion- 
expression links (for examples, see the reference lists 
in Elfenbein & Ambady, 2002; Keltner, Sauter, Tracy, & 
Cowen, 2019; Matsumoto et al., 2008; also see most of 
the studies reviewed in this article, e.g., Cordaro et al., 
2018). The exact facial configuration tested for each 
emotion category varies slightly from study to study 
(for examples, see Table 1), but a core, prototypic facial 
configuration for a given emotion category is still 
assumed within a single study. Review articles (again, 
perhaps unintentionally) reinforce the impression of 
unique face-emotion mappings by including tables and 
figures that display a single, unique facial configuration 
for each emotion category, referred to as the expres¬ 
sion, signal or display for that emotion (Fig. 2 presents 
two recent examples).^ This pattern of hypothesis test¬ 
ing and writing—that instances of one emotion cate¬ 
gory are expressed with a single prototypic facial 
configuration—reinforces (perhaps unintentionally) the 
common view that each emotion category is consis¬ 
tently and uniquely expressed with its own distinctive 
configuration of facial movements. Consumers of this 
research then assume that a distinctive configuration 
can be used to diagnose the presence of the corre¬ 
sponding emotion in everyday life (e.g., that a scowl 
indicates the presence of anger with high reliability and 
specificity). 

The common view of emotional expressions has also 
been imported into other scientific disciplines with an 
interest in understanding emotions, such as neurosci¬ 
ence and artificial intelligence (Al). For example, from 
a published article on AT 

American psychologist Fkman noticed that some 

facial expressions corresponding to certain emotions 

are common for all the people independently of 


their gender, race, education, ethnicity, etc. He 
proposed the discrete emotional model using six 
universal emotions: happiness, surprise, anger, 
disgust, sadness and fear. (Brodny et al., 2016, p. 1; 
emphasis in original) 

Similar examples come from our own articles. One 
series focused on the brain structures involved in per¬ 
ceiving emotions from facial configurations (Adolphs, 
2002; Adolphs, Tranel, Damasio, & Damasio, 1994), and 
the other focused on early life experiences (Poliak, 
Cicchetti, Hornung, & Reed, 2000; Poliak & Kistler, 
2002). These articles were framed in terms of “recogniz¬ 
ing facial expressions of emotion” and exclusively pre¬ 
sented participants with specific, posed photographs 
of scowling faces (the presumed facial expression for 
anger), wide-eyed, gasping faces (the presumed facial 
expression for fear), and other presumed prototypical 
expressions. Participants were shown faces of different 
individuals, and each person posed the same facial 
configuration for a given emotion category, ignoring 
the importance of individual and contextual variation. 
One reason for this flawed approach to investigating 
the perception of emotion from faces was that then—at 
the time these studies were conducted—as now, pub¬ 
lished experiments, review articles, and stimulus sets 
were dominated by the common view that certain emo¬ 
tion categories were signaled with an invariant set of 
facial configurations, referred to as “the facial expres¬ 
sions of basic emotions.” 

In our review of the scientific evidence, we test two 
hypotheses that arise from the common view of emo¬ 
tional expressions: that certain emotion categories are 
each routinely expressed by a unique facial configura¬ 
tion and, correspondingly, that people can reliably infer 
someone else’s emotional state from a set of facial 
movements. Our discussion is written for consumers of 
emotion research, whether they be scientists in other 
fields or nonscientists, who need not have deep knowl¬ 
edge of the various theories, debates, and broad range 
of findings in the science of emotion, with sufficient 
pointers to those discussions if they are of interest (see 
Box 2 in the Supplemental Material). 

In discussing what this article is about—the common 
view that a person’s emotional state is revealed in facial 
movements—it bears mentioning what this article is not 
about: It is not a referendum on the “basic emotion” 
view that we mentioned briefly, earlier in this section, 
proposed by the psychologist Paul Fkman and his col¬ 
leagues; nor is it a commentary on any other specific 
research program or individual psychologist’s view. 
Fkman’s theoretical approach has been highly influen¬ 
tial in research on emotion for much of the past 50 
years. We often cite studies inspired by the basic- 
emotion approach, and Fkman’s work, for this reason. 
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State Example Photo Action Units Physical Description 


Amusement 


6 + 7 + 

12 + 25 + 

26 + 53 

Head back, Duchenne 
smile, lips separated, 
jaw dropped 

Anger 

i 


4 + 5 + 

17 + 23 + 

24 

Brows furrowed, eyes 
wide, lips tightened 
and pressed together 

Boredom 


43 + 55 

Eyelids drooping, 
head tilted 

(not scorable with FACS: 
slouched posture, head 
resting on hand) 

Confusion 

i 

4 + 7 + 56 

Brows furrowed, eyelids 
narrowed, head tilted 

Contentment 


12 + 43 

1 

Smile, eyelids drooping 

Coyness 


6 + 7 + 12 + 
25 + 26 + 
i 52 + 54 + 

r' 

Duchenne smile, lips 
separated, head turned 
and down, eyes turned 
opposite to head turn 

Desire 

2 

19 + 25 + 

26 + 43 

Tongue shown, lips parted, 
jaw dropped, 
eyelids drooping 

Disgust 

1 


7 + 9 + 19 + 
25 + 26 

1 

Eyes narrowed, nose 
wrinkled, lips parted, 
jaw dropped, 
tongue shown 

Embarrassment 

_1 


7 + 12 + 

15 + 52 + 

54 + 64 

_ 

Eyelids narrowed, 
controlled smile, head 
turned and down 
(not scorable with FACS: 
hand touches face) 


State Example Photo Action Units Physical Description 


Fear 


1 + 2 + 4 + Eyebrows raised and pulled 
5 + 7 + 20 + together, upper eyelid 
i 25 raised, lower eyelid tense, 

I lips parted and stretched 

Happiness 


6 + 7+ Duchenne smile 

f 12 + 25 + 

[ 26 

L 

y 

1 +2 + 12 Eyebrows raised, slight 

Pain 

1 

2 

4 + 6 + 7 + Eyes tightly closed, nose 

9 + 17+ wrinkled, brows furrowed, 

18 + 23+ lips tight, pressed together, 
24 and slightly puckered 

Pride 

m f 

53 + 64 Head up, eyes down 

Sadness 

1 

Q 

1 + 4 + 6 + Brows knitted, eyes slightly 
15 + 17 tightened, lip corners 

depressed, lower lip raised 

Shame 

Q 

54 + 64 Head down, eyes down 

1 

Surprise 

jl 'O' 

^ A 

1 + 2 + 5 + Eyebrows raised, upper 

25 + 26 eyelid raised, lips parted, 

jaw dropped 

Sympathy 

1 


1+17+ Inner eyebrow raised, 

24 + 57 lower lip raised, lips 

pressed together, 
head slightly forward 


Fig. 2. (continued on next page) 
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EMOTION 

EXPRESSION 


HYPOTHESIZED 

PHYSIOLOGICAL 

FUNCTION 


HYPOTHESIZED 

COMMUNICATIVE 

FUNCTION 


RELEVANT 

RESEARCH 


Happiness 

9 

Sadness 

■ 

Anger 

1 

Fear 

H 

Surprise 

1 

Disgust 

[] 

Pride 

1 


Shame 

1 

Embarrassment 

f 


Research Needed 


Research Needed 


Research Needed 

Widened Eyes Increase Visual 
Field and Speed Up Eye Movements 

Widened Eyes Increase Visual 
Field to See Unexpected Stimulus 

Constricted Orifices Reduce 
Inhalation of Possible Contaminants 

Boots Testosterone and 
Increases Lung Capacity to 
Prepare for Agonistic Encounters 

Recues/Hides Bodily Targets 
From Potential Attack 

Recues/Hides Bodily Targets 
From Potential Attack 


Communicates a Lack of Threat 

Preuschoft & Van Hoof, 1997 
Ramachandran, 1998 

Tears Handicap Vision to Signal 
Appeasement and Elicit Sympathy 

Hasson (2009) 

Alerts of Impending Threat, 

Marsh, Ambady, & Kleck (2005) 

Communicates Dominance 

Wilkowski & Meier (2010) 

Alerts of Possible Threat and 
Appeases Potential Aggressors 

Marsh et al. (2005) 

Ohman & Mineka (2001) 
Susskindetal. (2008) 

Research Needed 

Ekman (1989) 

Warns About Aversive Foods, as 

Rozin et al. (1994) 

Well as Distatesful Ideas 

Chapman, Kim, Susskind, & 

and Behaviors 

Anderson (2009) 

Communicates Heightened 

Social Status 

Carney, Cuddy, & Yap (2010) 
Shariff & Tracy (2009) 

Tracy & Matsumoto (2008) 

Communicates Lessened Social 

Status, Desire to Appease 

Keltner & Harker (1998) 
Shariff & Tracy (2009) 

Tracy & Matsumoto (2008) 

Communicates Lessened Social 

Status, Desire to Appease 

Keltner & Buswell (1997) 


Fig. 2. Example figures from recently published articles that reinforce the common belief in prototypic facial expressions of emo¬ 
tion. The graphic in (a) was adapted from Table 2 in Keltner, D., Sauter, D., Tracy, J., and Cowen, A. (2019). Emotional expression: 
Advances in basic emotion theory. Journal of Non-Verbal Behavior. Photos originally from in Cordaro, D. T., Sun, R., Keltner, D., 
Kamble, S., Huddar, N., and McNeil, G. (2018). Universals and cultural variations in 22 emotional expressions across five cultures. 
Emotion, 18, 75-93, with permission from the American Psychological Association. Face photos copyright Dr. Lenny Kristal, used with 
permission. The graphic in (b) was adapted from Figure 2 in Shariff and Tracy (2011). 


In addition, the common view of emotional expressions 
is most readily associated with a simplified version of 
the basic-emotion approach, as exemplified by the 
quotes above. Critiques of Ekman’s basic-emotion view 
(and related views) are numerous (e.g., L. F. Barrett, 
2006, 2011; L. F. Barrett, Lindquist et al., 2007; Russell, 
1991, 1994, 1995; Ortony & Turner, 1990), as are rejoin¬ 
ders that defend it (e.g., Ekman, 1992, 1994; Izard, 
2007). Our article steps back from these debates. We 
instead focus on the existing research on emotional 
expression and emotion perception in general and ask 
whether the scientific evidence is sufficiently strong 
and clear enough to justify the way it is increasingly 
being used by those who consume it. 

A systematic approach for evaluating 
the scientific evidence 

when you see someone smile and infer that the person 
is happy, you are making what is known as a reverse 
inference: You are assuming that the smile reveals 


something about the person’s emotional state that you 
cannot access directly (see Fig. 3). Reverse inference 
requires calculating a conditional probability: the 
probability that a person is in a particular emotion 
episode (e.g., happiness) given the observation of a 
unique set of facial muscle movements (e.g., a smile). 
The conditional probability is written as 

/>(emotion category! a unique facial configuration) 

for example, 

p(happiness|a smiling facial configuration) 

Reverse inferences about emotion are ubiquitous in 
everyday life—whenever you experience someone as 
emotional, your brain has performed a reverse inference, 
guessing at the cause of a facial movement when you 
have access only to the movement itself Every time an 
app on a phone or computer measures someone’s facial 
muscle movements, identifies a facial configuration such 
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Emotion State 



Angry 

Afraid 


Brow 

Lowerer 

(AU4) 

True Positive 

False Positive 

-1—' 

c 

CD 

E 

CD 

Person Is Angry and 
Producing Facial 
Movement Thought to 

Be Expressive for Anger 

Person Is Afraid but 
Producing Facial 
Movement Thought to 

Be Typical for Anger 

> 

O 




'o 

Stretch 

Lips 

(AU20) 

False Negative 

True Negative 

cc 

U- 

Person Is Angry but 
Producing Facial 
Movement Thought to 

Be Expressive for Fear 

Person Is Afraid and 
Producing Facial 
Movement Thought to 

Be Expressive for Fear 

1 1 


High Reliability: 

A Scowling Facial 
Configuration Occurs 
Frequently When 
Someone Is Angry 


High Specificity: 

A Scowling Facial 
Configuration Occurs 
Rarely When 
Someone Is Not 
Angry 


Positive 

Predictive Value: 
The Probability 
That Someone 
Who Is Angry Is 
Scowling 


Negative 
Predictive Value: 
The Probability 
That Someone 
Who Is Fearful 
(Not Angry) Has 
Tightened Lips 


Fig. 3. Defining reliability and specificity. Anger and fear are used as the example categories. 


as a frowning facial configuration, and proclaims that 
the target person is sad, that app has engaged in reverse 
inference, such as 

/)(sadness I a frowning facial configuration) 

Whenever a security agent infers anger from a scowl, 
the agent has assumed a strong likelihood for 

p(anger|a scowling facial configuration) 

Four criteria must be met to justify a reverse inference 
that a particular facial configuration expresses and therefore 


reveals a specific emotional state: reliability, specificity, 
generalizability, and validity (explained in Table 2 
and Fig. 3). These criteria are commonly encountered in 
the field of psychological measurement, and over the 
past several decades, there has been an ongoing dia¬ 
logue about thresholds for these criteria as they apply 
in production and perception studies, with some con¬ 
sensus emerging for the first three criteria (see Haidt & 
Keltner, 1999). Only when a pattern of facial muscle 
movements strongly satisfies these four criteria can we 
justify calling it an “emotional expression.” If any of these 
criteria are not met, then we should instead use neutral, 
descriptive terms to refer to a facial configuration with¬ 
out making unwarranted inferences, simply calling it a 
smile (rather than an expression of happiness), a frown 























c 

V, 

'> 

w 


a 

a 

w 

a; 

V 

'a 

e3 

> 

W 


■o 

V 

e3 


CJ 

ri 

a 

z 

c« 

H 


c 

o 

a 

a; 

u 

w 

a; 

a 

c 

o 

o 

p 

w 


a 

X 

w 


!U 

T3 

£ 

T3 rH 

'> —T' 

P 


C OJ 

OJ 

0 ) 

I 


c 0 


■O 

V 

■Xj 

^ V o 
^ .g a 


^ a 


_ a; 
a.s2 u 
-Q 

C H „ 
O 

'S 

2 M ti 

3 C 3 

oc -S 


OJ •" 
> 1) 
!U C 

<J o 

s g 

w « 




C3 

03 _C 

£ ^ 

^ c 

b > 
o o 

B i> 

C3 U 


c 

o 

o 

s 

j; 

b 

&JD 

C 

C3 


y a.5P 

VC 

oj:)_g o 

c ^ u 


^ c 

CO 

o; 

JsJ 


b 

• • V 
w V 

C u 
V O 
to 
•Jo 

c a 


a; 

a ' 

</> (U 


■ <U 

-G 

T3 

■ O 

O 

■ ^ ■ 

1 ; 


V -c 

r- 3 

■5 u 

CO 


C3 O 
■£ 

^ V 


o 

^ a; 

-o -Q 


c o 

V _G 

^ CO 


0 

a 

a 

G 

CO 

V 


V 
u 

c 
■o ^ 

OjC ' 

c 

s g 


a 


^ -rl 

> ^ 
g Np 

Qh o 

-sP ^ 

O C 


V 

S a 

CO 

d 

60 

"2 -S 

1 ^ 

! 

o -O 

<S o 

2 "o 
o p 


u 

'x 

O « 

R 2 
_ ^ 
C u 

3 "2 

B 

CO ^ 

g ■> 
2 d 

X- O 

.B- a 
■y a 

^ o 

<J u 
OJ 

a 2 

C6 -d 


si 

a b 

X 0 
V (X) 
a; B 

-f s 

•nl c 

1 0-2 

^ -s te,. 

2 -b £ —n - 
't: c ° d ° 
2S oj ^ 

a &• 2 
.9 d B 

!U a; ^ 

tH S-l 

Qh O aj 
X C it 

(u ^ d 

”7—t "O 

1 

I 

o 32 a 
a ° ^ 

9^ S 


_ -G 
?3 ;g 
71 a; 
2 


V 


T7- CO 

?3 


0 

fX 

2 

u 

G 

0 

B 


X 7b 
1 : 2 : ^ 


X -! “L) 
« O ^ 
CO x: ^ 

CO ^ 


'x 

G 

O ^ 


^ n 7 

X 

7i 


G G ‘l-' 

s a ^ 

a G £ 
cd -C 
C G ^ 
X U X 


G 

'ti ^ d 
"S a ^ 

^-d 9 
o -SS ^ 

B V rB 

C C V 

G n ^ 

08 a 

u J= G 

o da 
o -2 

d ^ O 
*_3 c3 ,:^ 

9 3= 

OJ '-M 
33 d O 
^ 3 






33 


3 

cd 


2 3d 

d H 
^ G 

^ -S 

■f ^ 

c a 

d 8 


P 

o 

60 

2 

7 n 

“ M 

G ^ 
O V 
G pG 

o b 
So 


^ X 

a; 

2 ^ 

OJ ^ 

CO (U 


^ CO 

X I 

b 

V 

> V • 
G u 

I §- 

o "S X 

B co" 73 

X G Tit 


d 2 

I-a 

__ <r3 

p 


C > 

•is 

w 3 

H .a , 
a s£ .— 


3 a 
9 2 

G CO 

d § 


73 

X 


1^ 
> X 
G tU 


G V 
7 7? 

V -9 

r- 73 


o; S 
u Di 

G (jj 
O; _G 
CO "ii 
lU 

^ £ 

^ 0 

^ T3 

s 2 

£ ^ 

C4_ 

G G 

73 

V qj 

£ -Q 


£ 

X 73 

V — 


3 XI 


8 -a 

a “ 

_g a 
u 


^ >;S 

d '3 d 

o 2 -9 

a d 

2 a >, 

§ 1-2 G 

G Cl- g 

G c d 
0^0 
u Q- u 


73 

G 

<7 


X 

7 


73 ^7 

OJ ^ 
Vh qj 

OhX 

0 

G 


■g 

0 

73 

G 

7 


V 7 
U X 
G ^ 
o; ^ 

CO c 
q; V 

a.§ 

, 1 , ^ 

B V 

”9 Cl. 

X 

^ V 


V 

G 

a 

a 

7 

X 


^ o 
§ 

to V 

G 

X 

- G 

O 


u ^ 


u H 
0 . 
V 
<7 


G 

0 

G "£ 
G ^ 
X.2 


B a 
S B 

o 7 

^ X 

c ^ 

S-. 

O r9 

c a 


qj X 

^ X 
u X 
O G 


O 

£ 

03 

G ■• 
03 
> 
X 


X 

G 


7 ^ 


c a 

03 I 0 
G 
O 


03 

U 

I ^ 

U 


G 

O _ 

8 o 

G 03 

b G 
"X <13 
G CO 
O X 
U 7 


2 p 

d o 

^ Cj_ 

o o 

U CO 

X 

■B ^ 

u 03 
,7 —' 
03 

03 9 

X c 
X 

S d 

E; c 

03 03 
X 

■S C 

O 73 


03 


G 7 q; 

G 9 

d .2 "b 

G ^ 

2 £;§ 


— 03 

o .B 

03 ^ 
Qh 03 

C 

-n 0 

3 i 

cr G 

2 a 

y 

X C-H 
.. 7 

D D. 


03 

03 _2 
'j^ "G 


—t 03 
^ X 

s *= 

7 03 

X 5P 
C G 
X > 


^ a 

X 


7 3 

X £ 

^ o 


’9 c 

G 0 
G -a 
G- 0 
03 C 
73 G 
^ 03 


03 

7 "a 

3 S 

U 03 
•G X 
b 03 
7 , 

o 

03 X 


B 


X 

G 

X >’£ 

G C £ 
d q3 73 

G. 03 
O X G 
G 03 

a s a 

o X 

>^ CO 


03 

X 

>. 

7 


S n a 

q; G. [x; 

7 ^ 

8 ^ b: c 

a 7 
03 X 
G 


O 

X 


f .a 

03 U 
X ^ 

^ 03 
u r- 

O ^ 

B u 

X 


a.: 

£ 


73 ■Tl 

di X 

P 03 


^ o 

03 X 


G "G 

.2 § 


M H 


OS 

X 

03 

X 


c 

o _ 

b3 ^ 
^ G 

G u 
03 ^ 

a 7 

73 03 

G ^ 

CO G 


X 

73 


£ o 

G 03 

■_C 


G ^ 
X C 
73 


^ 03 
B 03iD 


03 

X.: 

C 1 

7 1 


V 


G 

u 

775 

G 

CO 

V 

73 

X 

Q- 7 ^ 

73 

B 


G 
03 

03 to 


O 

G 

X 

£ 

£ 

Olj 


o ^ 

8 2 >■ 

“ .a "5 

S'S g 

•d o .a 

.a d c 


"d 

d 


d a 

d d CO 

6 8 .S 

d 3 a 
Cd to -9 

^ 2 .a 

§ d 

11 a 

b d c 

c ^ 2 

9 o -G 

<2 ^ 


If 

§1 

G 03 

a ^ 

b G 

-3 C4_ 

X G 
X ■"■ 
G G 
0 X 

2 
03 3 
u O 
,7 X 

C4_ 

X CO 
G CO 

d 2 

^ I 

O S 

G CO 
^ w 
03 O 

G ff 
d $ 
X 03 

^ c 

co" 3 
CO 03 
03 CO 

-S 

73 03 

7 I- 

CO b* 

"c 
0 o 

ad 

G 

a 

X 03 
03 X 


> 7 

O T3 

X q; 

73 > 


b-g 

V ^ 
S CO 

_ 03 

03 £ 
U 

C 03 
OS CO 

B 

u X 
C G 

B o 
£ -0 
w G 

V 03 
ti a 

V 03 
X -O 


CO*- B 

a c 

•d 2 

2 .a 

CO 

^ X 

s ^ 

u 

7^ "O 

G b 
7 & 
.£ £ 
— 03 
G. CO 

2 5 ; 

"O 7 


03 

X 


X X ! 
a OJD : 
£ £ : 
P ' 


7:3 ; 
V . 

"aJ 


03 

CO 03 


C '1^ 
o CL 

d -S' 

d c 

60 d 


d ^ 

CO 

CO 703 

03 X 73 


G 
0 
u 

d^ 


-2 a 


-2 „ 
d 2 

^ 3 
o 2 
i= -S 

73 

73 
O G 
■ < 


7 

•£ 

4-( 

03 -t:; 

X 


73 V 
X G 
B X 

03 

03 73i 


i ~0 
■ G 

773 


73 


V 


U V 

- SI 

S a d 

2 &■ d 
£ -G 

X CO cj 
C 03 . 


-i CO 

O g 

X .2 

^ X 

-X =5 
u X 
X 0 
'G Cd 
d -a 
Cd d 

CO 03 

■di ^ B 3 
a to -d u 
-d d c 
S’ d 

(U 


33 

73 


773 X 

.3 .a 

V X 


X -g 
.2 > 
~v 


C73 CO 
£ 

X (L) 

b c 
o 


G 
C CO 

03 W 


C3 


CO cj 

C X 

u. u 

G 03 

S 

G. 


S 2 ’ 

X G ' 
G X 
O G ’ 
u u 


G 

O ^ 
u .2 

^ G 

X .£ 



u 

03 

a 


X 

7^ 

N 


V 

c 

G 

o 


X 
-£ 

”03 

'd ^ 
d 

Sr S 

ss 

•S' « 

d .2 
3 -d 

■d 3 

C CO 
^ >. 

S s 

.a « 

CO O 

c h 

8 d 


73 ' 

G 

O 

O 

£ 

03 

X ' 
03 
U 
03 

a 

X 

03 

03 

S £ 

to r- 

c -£ 

a 

03 7 
X g 

O CO 


4 — c 

o 

d OTj' 
03 r“ 

G -S 

V N 
^ r“ 

OJ « 

B X 
c o 

■-' 03 
u 0 ; 


G 

G 

. G 

X X 

s s 

03 *73 
03 CO 

a X 
.£ 

"0 ^ 

S ^ 

7 iri 


CO 


G ^ 
O -d 

U2 
8 2 
9 « 

a qj 
G X 
V _ 
£ £ 

X S 

£ b 


73 


G 
O 

^ ■£ 
X o 

g 

G V 
X 

2 '£ 
u 53 ■ 
,73 X 

'-I-H CO 

7 ctS 


03 

03 - 
03 X 

^ G 
O 


-b a 


03 

a 

X 

.£ 

'03 

X 

c 

o 

CO 

^2 

03 £ 

-B 

4 — ^ 

G 

b .2 

G 

C § 


V 


G 

U 

W 


X 

g "d 

4— B 


G V 

o a 
'£ 

CO V 

£ G 

a-G 
X ^ 

03 G 


^ I 

B S 

7 B 

CO V 




73 

11 
a d 

03 > 


G /-N 

.2 b 


. X 

775 


.£ ' 


'O {1 ) ■ 

8 d 
V X 

E 33 

° p 


c 

o 

to 
03 

B a“ 

X c 


d a 


d ^ 
7 b 

to- g 

a s 

d w 


775 

G 

.2 ' 
o 
B 

03 V 

b CO 
X c 
7 2 

CO 

£ V ' 

X a’ 

>. 7 
7 ti 

s S 

>.x 

G 

o S 

03 03 

X G 

^ 03 


775 

-si 

X 773 

d s a 
^ 2 S. 

d Cd d 


d d 
O to 

li 

-■§ 


CO 


S-( 

G V 


2 g 

G O 
B >' 

73 r- 

C ^ 

G 

O 


U-S -S’ 

.a o d 

■d ^ „ 

d a 2 


CO 

V 03 

M a 

03 X 

4-1 03 

b 03 

4-J _G 

C "G 

G 4_ 

2 ® 
'co 03 

o S 

03 CO 

B ”£ 

£ c 

7 O 


CO 


3 c 

X O 

X 

G O 

o B 

V ^ 
■£ V 


73 
G 
O 

X CO 

O Xi 
G 7 
C 0 ; 
0 ; X 


■, > o 

3 d’^ 

3 .a "d 

> d 9 

.a 8 d 

d S S 


60 g 

§ g 

b 03 

^ X 

03 ^ 

X ^ 
T3 O 
X X 
7 


X 


b8 

o ^ 

X V 


P 7 
03 v- 

a G 

X X 
03 X 
03 2 

X 2 


C 775 

O .B ^ 

G j, c3 

— X 

775 


03 
-7 

^ b 

< o 


3 ^ 

CO 

^ O 

X 


2 ”0 
4—^ iG 

7 

u 03 
G ^ 
X c 
X 03 
£ 

O G 

03 03 

X a 
.2 G 
"03 X 

X c 


b 

X 

”£ 

> 


IS G 

^ B 

CO*' C 

tn •— 

c ^ 

03 X 

■I c 

G ctj 
7 qj 

2 g 


C/} 


.a § 

B' o 

9 a 

b= 03 

^ £ 
a-:! 

o 

P~i b 

b S 

frn V 
q; G 
03 •"■ 
CO oj 

G ■£ 

2 7 

a cu 

C, 03 


CO oj 

B G 
~o 
£ 

03 S 

X — 

G c:3 


O S 
‘X 03 

B 

2 03 
03 "B 
> 03 

•G X 

I * 

a;b 

03 "B 

S I 

03 B 


-§ S 

> 


0 O 


C "o. 
i> .£■ 


•b -ti X 

■2 ,9 to. 


£ 03 ^ 
D Q. G 


2 - 
X £ 


.G ^ 

p3 


73 


"7 X £ 

G rri O 

V ^ ^ 

b 03 03 

2 d 1 

I 2 g 


10 








Facial Expressions of Emotion 


11 


(rather than an expression of sadness), a scowl (rather 
than an expression of anger), and so on.*’ 

The null hypothesis and the role 
of context 

Tests of reliability, specificity, generalizability, and 
validity are almost always compared with what would 
be expected by sheer chance, if facial configurations 
(in studies of expression production) and inferences 
about facial configurations (in studies of emotion per¬ 
ception) occurred randomly with no relation to particu¬ 
lar emotional states. In most studies, chance levels 
constitute the null hypothesis. An example of the null 
hypothesis for reliability is that people do not scowl 
when angry more frequently than would be expected 
by chance.^ If people are observed to scowl more fre¬ 
quently when angry than they would by chance, then 
the null hypothesis is rejected on the basis of the reli¬ 
ability of the findings. We can also test the null hypoth¬ 
esis for specificity: If people scowl more frequently 
than they would by chance not only when angry but 
also when fearful, sad, confused, hungry, and so forth, 
then the null hypothesis for specificity is retained.® 

Tests of generalizability are becoming more common 
in the research literature, again using the null hypoth¬ 
esis. Questions about generalizability test whether a 
finding in one experiment is reproduced in other exper¬ 
iments in different contexts, using different experimen¬ 
tal methods or sampling people from different 
populations. There are two crucial questions about 
generalizability when it comes to the production and 
perception of emotional expressions: Do the findings 
from a laboratory experiment generalize to observa¬ 
tions in the real world? And, do the findings from stud¬ 
ies that sample participants from Westernized, educated, 
industrialized, rich, and democratic (WEIRD; Henrich, 
Heine, & Norenzayan, 2010) populations generalize to 
people who live in small-scale remote communities? 

Questions of validity are almost never addressed in 
production and perception studies. Even if reliable and 
specific facial movements are observed across gener- 
alizable circumstances, whether these facial movements 
can justify an inference about a person’s emotional state 
is a difficult and unresolved question. (We have more 
to say about this later.) Consequently, in this article, we 
evaluate the common view by reviewing evidence per¬ 
taining to the reliability, specificity, and generalizability 
of research findings from production and perception 
studies. 

When observations allow scientists to reject the null 
hypothesis for reliability, defined as observations that 
could be expected by chance alone, such evidence 


provides necessary but not sufficient support for the 
common view of emotional expressions. A slightly 
above chance co-occurrence of a facial configuration 
and instances of an emotion category, such as scowling 
in anger—for example, a correlation coefficient (r) of 
about .20 to .39 (adapted from Haidt & Keltner, 1999)— 
suggests that a person sometimes scowls in anger, but 
not most or even much of the time. Weak evidence for 
reliability suggests that other factors not measured in 
the experiment are likely causing people to scowl dur¬ 
ing an instance of anger. It also suggests that people 
may express anger with facial configurations other than 
a scowl, possibly in reliable and predictable ways. Fol¬ 
lowing common usage, we refer to these unmeasured 
factors collectively as context. A similar situation can 
be described for studies of emotion perception: When 
participants label a scowling facial configuration as 
“anger” in a weakly reliable way (between 20% and 
39% of the time; Haidt & Keltner, 1999), then this sug¬ 
gests the possibility of unmeasured context effects. 

In principle, context effects make it possible to test 
the common view by comparing it directly with an 
alternative hypothesis —that a person’s brain will be 
influenced by other causal factors—as opposed to com¬ 
paring the findings with those expected by random 
chance. It is possible, for example, that a state of anger 
is expressed differently depending on various factors 
that can be studied, including the situational context 
(e.g., whether a person is at work, at school, or at home), 
social factors (e.g., who else is present in the situation 
and the relationship between the expresser and the 
perceiver), a person’s internal physical context (e.g., 
how much sleep they had, how hungry they are), a 
person’s internal mental context (e.g., the past experi¬ 
ences that come to mind or the evaluations they make), 
the temporal context (what occurred just a moment 
ago), differences between people (e.g., whether some¬ 
one is male or female, warm or distant), and the cultural 
context, such as whether the expression is occurring in 
a culture that values the rights of individuals (compared 
with group cohesion) and is open and allows for a 
variety of behaviors in a situation (compared with 
closed, having more rigid rules of conduct). Other theo¬ 
retical approaches offer some of these specific alterna¬ 
tive hypotheses (see Box 2 in the Supplemental Material). 
In practice, however, experiments almost always test the 
common view against the null hypothesis and rarely test 
specific alternative hypotheses. When context is 
acknowledged and studied, it is usually examined as a 
factor that might moderate a common and universal 
emotional expression, preserving the core assumptions 
of the common view (e.g., Cordaro et al., 2018; for more 
discussion, see Box 3 in the Supplemental Material). 
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A focus on six emotion categories: 
anger, disgust, fear, happiness, 
sadness, and surprise 

Our critical examination of the research literature in 
this article focuses primarily on testing the common 
view of facial expressions for six emotion categories— 
anger, disgust, fear, happiness, sadness, and surprise. 
We do not discuss every emotion category ever studied 
in the science of emotion. We do not discuss the many 
emotion categories that exist in non-English-speaking 
cultures, such as gigil (the irresistible urge to pinch or 
squeeze something cute) or liget (exuberant, collective 
aggression; for discussion of non-English emotion cat¬ 
egories, see Mesquita & Frijda, 1992; Pavlenko, 2014; 
Russell, 1991). We do not discuss the various emotion 
categories that have been documented throughout his¬ 
tory (e.g., T. W. Smith, 2016). Nor do we discuss every 
English emotion category for which a prototypical facial 
expression has been suggested. Eor example, recent 
studies motivated primarily by the basic-emotion 
approach have suggested that there are “more than six 
distinct facial expressions ... in fact, upwards of 20 
multimodal expressions” (Keltner et al., 2019, Introduc¬ 
tion, para. 6), meaning that scientists have proposed a 
distinct, prototypic facial configuration as the facial 
expression for each of 20 or so emotion categories, 
including confusion, embarrassment, pride, sympathy, 
awe, and others. 

We focus on six emotion categories for two reasons. 
First, as we already noted, these categories anchor com¬ 
mon beliefs about emotions and their expressions and 
therefore represent the clearest, strongest test of the 
common view. They can be traced to Charles Darwin, 
who stipulated (rather than discovered) that certain 
facial configurations are expressions of certain emotion 
categories, inspired by photographs taken by Duchenne 
(1862/1990) and drawings made by the Scottish anato¬ 
mist Charles Bell (Darwin, 1872/1965). The proposed 
expressive facial configurations for each emotion cat¬ 
egory are presented in Figure 4, and the origin of these 
facial configurations is discussed in Box 4 in the Sup¬ 
plemental Material. 

Second, these six emotion categories have been the 
primary focus of systematic research for almost a cen¬ 
tury and therefore provide the largest corpus of scien¬ 
tific evidence that can be evaluated. Unfortunately, the 
same cannot be said for any of the other emotion cat¬ 
egories in question. This is a particularly important 
point when considering the more than 20 emotion cat¬ 
egories that are now the focus of research attention. A 
Psycinfo search for the term “facial expression” com¬ 
bined with “anger, disgust, fear, happiness, sadness, 
surprise” produced over 700 entries, but a similar search 
including “love, shame, contempt, hate, interest, 


distress, guilt” returned fewer than 70 entries (Duran & 
Fernandez-Dols, 2018). Almost all cross-cultural studies 
of emotion perception have focused on anger, disgust, 
fear, happiness, sadness, and surprise (plus or minus a 
few), and experiments that measure how people spon¬ 
taneously move their faces to express instances of emo¬ 
tion categories rarely include categories beyond these 
six. In particular, too few studies measure spontane¬ 
ous facial movements during episodes of other emo¬ 
tion categories (i.e., production studies) to conclude 
anything about reliability and specificity, and there 
are too few studies of how these additional emotion 
categories are perceived in small-scale, remote cul¬ 
tures to conclude anything about generalizability. In 
an era where the generalizability and robustness of 
psychological findings are under close scrutiny, it 
seemed prudent to focus on the emotion categories 
for which there are, by a factor of 10, the largest 
number of published experiments. Nonetheless, our 
review of the empirical evidence for expressions of 
emotion categories beyond anger, disgust, fear, hap¬ 
piness, sadness, and surprise did not reveal any new 
information that weakens the conclusions we discuss 
in this article. As a consequence, our discussion here, 
which is based on a sample of six emotion categories, 
generalizes to those other emotion categories that 
have been studied.^ 

Producing Facial Expressions of 
Emotion: A Review of the Scientific 
Evidence 

In this section, we first review the design of a typical 
experiment in which emotions are induced and facial 
movements are measured. We highlight several obser¬ 
vations to keep in mind as we review the reliability, 
specificity, and generalizability for expressions of anger, 
disgust, fear, happiness, sadness, and surprise in a 
variety of populations, including adults in urban or 
small-scale remote cultures, infants and children, and 
congenitally blind individuals. Our review is the most 
comprehensive to date and allows us to comment on 
whether the scientific findings generalize across differ¬ 
ent populations of individuals. The value of doing so 
becomes apparent when we observe how similar con¬ 
clusions emerge from these research domains. 

The anatomy of a typical experiment 
designed to observe people’s facial 
movements during episodes of emotion 

In the typical expression-production experiment, scien¬ 
tists expose participants to objects, images, or events that 
they (the scientists) believe will evoke an instance of 
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Fig. 4. Facial action ensembles for common-view facial expressions. Facial action coding system (FACS) codes can be used to describe 
the proposed facial configuration in adults. The proposed expression for anger (a) corresponds to a prescribed emotion FACS (EMFACS) 
code for anger (described as AUs 4, 5, 7, and 23). The proposed expression for disgust (b) corresponds to a prescribed EMFACS code for 
disgust (described as AU 10). The proposed expression for fear (c) corresponds to a prescribed EMFACS code for fear (AUs 1,2, and 5 or 

5 and 20). The proposed expression for happiness (d) corresponds to a prescribed EMFACS code for the so-called Duchenne smile (AUs 

6 and 12). The proposed expression for sadness (e) corresponds to a prescribed EMFACS code for sadness (AUs 1, 4, 11, and 15 or 1, 4, 
15, and 17). The proposed expression for surprise (0 corresponds to a prescribed EMFACS code for surprise (AUs 1, 2, 5, and 26). It was 
originally proposed that infants express emotions with the same facial configurations as adults. Later research revealed morphological 
differences between the proposed expressive configurations for adults and infants. Of a possible 19 proposed configurations for negative 
emotions from the infant coding scheme, only 3 were the same as the configurations proposed for adults (Oster, Hegley, & Nagel, 1992). 
The proposed expressive prototypes in (g) are adapted from Cordaro, D. T., Sun, R., Keltner, D., Kamble, S., Huddar, N., and McNeil, 
G. (2018). Universals and cultural variations in 22 emotional expressions across five cultures. Emotion, 18, 75-93, with permission from 
the American Psychological Association. Face photos copyright Dr. Lenny Kristal. The proposed expressive prototypes in (h) are adapted 
from Figure 2 in Shariff and Tracy (2011). 


emotion. It is possible, in principle, to evoke a wide 
variety of instances for a given emotion category (e.g., 
Wilson-Mendenhall, Barrett, & Barsalou, 2015); in prac¬ 
tice, however, published studies evoke what scientists 
believe are the most typical instances of each category, 
usually elicited with a stimulus that is presented without 
context (e.g., a photograph, a short movie clip separated 
from the rest of the film or a simplified description of an 
event, such as “your cousin has just died, and you feel 
very sad”; Cordaro et ah, 2018). Scientists usually include 
some measure to verify that participants are in the 
expected emotional state (e.g., asking participants to 
describe how they feel by rating their experience against 


a set of emotion adjectives). They then observe partici¬ 
pants’ facial movements during the emotional episode 
and quantify how well the measure of emotion predicts 
the observed facial movements. When done properly, this 
yields estimates of reliability and specificity and, in prin¬ 
ciple, provides data to assess generalizability. There are 
limitations to assessing the validity of a facial configura¬ 
tion as an expression of emotion, as we explain below. 

Measuring facial movements. Healthy humans have 
a common set of 34 muscle groups, 17 on each side of 
the face, that contract and relax in patterns.^® To create 
facial movements that are visible to the naked eye, facial 
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muscles contract, changing the distance between facial 
features (Neth & Martinez, 2009) and shaping skin into 
folds and wrinkles on an underlying skeletal structure. 
Even when facial movements look the same to the naked 
eye, there may be differences in their execution under 
the skin. There are individual differences in the mechan¬ 
ics of making a facial movement, including variation in 
the anatomical details (e.g., muscle configuration and 
relative size vary, and some people lack certain muscle 
components), in the neural control of those muscles 
(Cattaneo & Pavesi, 2014; Hutto & Vattoth, 2015; Miiri, 
2015), and in the underlying skeletal structure of the face 
(discussed in Box 5 in the Supplemental Material). 

There are three common procedures for measuring 
facial movements in a scientific experiment. The most 
sensitive, objective measure of facial movements, called 
facial electromyography (EMG), detects the electrical 
activity from actual muscular contractions (again, see 
Box 5 in the Supplemental Material). This is a perceiver- 
independent way of assessing facial movements that 
detects muscle contractions that are not necessarily vis¬ 
ible to the naked eye (Tassinary & Cacioppo, 1992). The 
utility of facial EMG is unfortunately offset by its imprac- 
ticality: It requires placing electrodes on a participant’s 
face in a particular configuration. In addition, a person 
can typically tolerate only a few electrodes on the face 
at one time. At the writing of this article, relatively few 
published articles (we identified 123) reported the use 
of facial EMG, the overwhelming majority of which 
sparsely sampled the face, measuring the electrical sig¬ 
nals for only a small number of muscles (between one 
and six); none of the studies measured naturalistic facial 
movements as they occur outside the lab, in everyday 
life. Consequently, we focus our discussion on two other 
measurement methods: a perceiver-dependent method 
that describes visible facial movements, called facial 
actions. Human coders indicate the presence or absence 
of a facial action while viewing video recordings of 
participants. Automated methods also exist for detecting 
facial actions from photographs or videos. 

Measuring facial movements with human coders. The 
Facial Action Coding System, or EACS (Ekman, Eriesen, 
& Hager, 2002), is a systematic approach to describe what 
a face looks like when facial muscle movements have 
occurred. EACS codes describe the presence and inten¬ 
sity of facial movements. FACS is purely descriptive and 
is therefore agnostic about whether those movements 
might express emotions or any other mental event. 
Human coders train for many weeks to reliably identify 
specific movements called action units (AUs). Each AU is 
hypothesized to correspond to the contraction of a dis¬ 
tinct facial muscle or a distinct grouping of muscles that 
is visible as a specific facial movement. For example, the 
raising of the inner corners of the eyebrows (contracting 


the frontalis muscle pars medians') corresponds to AU 1. 
Lowering of the inner corners of the brows (activation 
of the corrugator supercilii, depressor glabellae, and depres¬ 
sor supercilii) corresponds to AU 4. AUs are scored and 
analyzed as independent elements, but the underlying 
anatomy of many facial muscles constrains them so that 
they cannot move independently of one another, which 
generates dependencies between AUs (e.g., see Hao, 
Wang, Peng, & Ji, 2018). A list of facial AUs and their 
corresponding facial muscles can be found in Figure 5. 
Expert FACS coders approach interrater reliabilities of 
.80 for individual AUs (Jeni, Cohn, & De la Torre, 2013). 
The first version of FACS (Ekman & Eriesen, 1978) 
was based largely on the work of Swedish anatomist 
Carl-Herman Hjortsjo, who catalogued the facial con¬ 
figurations described by Duchenne (Hjortsjo, 1969). In 
addition to the updated versions of FACS (Ekman et al., 
2002 ), other facial coding systems have been devised 
for human infants (Izard et al., 1995; Oster, 2007), 
chimpanzees (Vick, Waller, Parr, Smith Pasqualini, & 
Bard, 2007) and macaque monkeys (L. A. Parr, Waller, 
Burrows, Gothard, & Vick, 2010; see also L. F. Barrett, 
2017a). Figure 4 displays the common FACS codes for 
the configurations of the facial movements that have 
been proposed as the prototypic expressions of anger, 
disgust, fear, happiness, sadness, and surprise, respec¬ 
tively. 

Measuring facial movements with automated algo¬ 
rithms. Human coders require time-consuming, intensive 
training and practice before they can reliably assign AU 
codes. After training, coding photographs or videos frame 
by frame is a slow process, which makes human FACS 
coding impractical to use on facial movements as they 
occur in everyday life. Large inventories of naturalistic 
photographs and videos—which have been curated only 
fairly recently (Benitez-Quiroz, Srinivasan, & Martinez, 
2016)—would require decades to manually code. This 
problem is addressed by automated FACS coding sys¬ 
tems using computer-vision algorithms (Martinez, 2017; 
Martinez & Du, 2012; Valstar, Zafeiriou, & Pantic, 2017).'^ 
Recently developed computer vision systems have auto¬ 
mated the coding of some (but not all) facial AUs (e.g., 
Benitez-Quiroz, Srinivasan, & Martinez, 2018; Benitez- 
Quiroz, Wang, & Martinez, 2017; Chu, De la Torre, & 
Cohn, 2017; Corneanu, Simon, Cohn, & Guerrero, 2016; 
Essa & Pentland, 1997; Martinez, 2017a; Martinez & Du, 
2012; Valstar et al., 2017; see Box 6 in the Supplemental 
Material), making it more feasible to observe facial move¬ 
ments as they occur in everyday life, at least in principle 
(see Box 7 in the Supplemental Material). 

Automated FACS coding is accurate (> 90%) com¬ 
pared with coding from expert human coders, provided 
that the images were captured under ideal laboratory 
conditions, where faces are viewed from the front, are 
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AU 

Description 

Facial Muscles (Type of Activation) 

1 

Inner Brow 
Raiser 

Frontalis (pars 
medians) 

Arn 

2 

Outer Brow 
Raiser 

Frontalis (pars 
lateralis) 

■m 





4 

Brow 

Lowerer 

Corrugator 

supercilii, 

depressor supercilii 

.--i- 

5 

Upper-Lid 

Raiser 

Levator 

palpebrae 

superioris 


6 

Cheek 

Raiser 

Orbicularis oculi 
(pars orbitalis) 

. :.A 

7 

Lid 

Tightener 

Orbicularis oculi 
(pars palpebralis) 


9 

Nose 

Wrinkle 

Levatorlabii 

superioris 

alaquaenasi 


10 

Upper-Lip 

Raiser 

Levatorlabii 

superioris 


11 

Nasolabial 

Deepener 

Zygomaticus 

minor 


12 

Lip-Corner 

Pulier 

Zygomaticus 

major 


13 

Cheeks 

Puffer 

Levatoranguli 

oris 


14 

Dimpler 

Buccinator 


15 

Lip-Corner 

depressor 

Depressor anguli 
oris 


16 

Lower-Lip 

depressor 

Depressor labii 
inferioris 


17 

Chin Raiser 

Mentalis 



AU Description Faciai Muscles (Type of Activation) 


Incisiviilabii 

18 superioris and 

Puckerer . . . ... , ... , . . 

incisiviilabii inferions 


2Q Lip ffisor/ws with 

Stretcher platysma 


22 1:'^ , Orbicularis oris 

Funneler 

L4i 

23 1:'.*^, ^ Orbicularis oris 

Tightener 


24 Lip Pressor Orbicularis oris 


I TTT ^ 

Depressor laoii 

n inferioris or re\aM\o 

25 Lips Part 


1 orbicularis oris | 

Masseter, relaxed 

„ temporalis and 

26 Jaw Drop 


1 fJiGiyyuiu 1 

2^ Mouth Pterygoids, 

Stretch digastric 

■fri 

28 Lip Suck Orbicularis oris 


41 Lid Droop 

mm 

42 Siit 


43 

Closed 

mm 

44 Squint 


45 Blink 

1 

0 

46 Wink 



Fig. 5. Facial Action Coding System (FACS; Ekman & Friesen, 1978) codes for adults. AU = action unit. 


well illuminated, are not occluded, and are posed in a 
controlled way (Benitez-Quiroz et al., 2016). (It is 
important to note, however, that “accuracy” here is de¬ 
fined as the FACS coding produced by human judges— 
which may well have errors.) Under ideal conditions. 


accuracy is highest (-99%) when algorithms are tested 
and trained on images from the same database (Benitez- 
Quiroz et al., 2016). The best of these algorithms works 
quite well when trained and tested on images from 
different databases (-90%), as long as the images are 
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all taken in ideal conditions (Benitez-Quiroz et al., 

2016) . Accuracy (compared with human FACS coding) 
decreases substantially when coding facial actions in 
still images or in video frames taken in everyday life, 
in which conditions are unconstrained and facial con¬ 
figurations are not stereotypical (e.g., Yitzhak et al., 

2017) .^^ For example, 38 automated FACS coding algo¬ 
rithms were recently trained on 1 million images (the 
2017 EmotioNet Challenge; Benitez-Quiroz, Srinivasan, 
Feng, Wang, & Martinez, 2017) and evaluated against 
separate test images that were FACS coded by experts. 
In these less constrained conditions, accuracy dropped 
below 83%, and a combined measure of precision and 
recall (a measure called ranging from zero to one) 
was below .65 (Benitez-Quiroz, Srinivasan, et al., 2017).^^ 
These results indicate that current algorithms are not 
accurate enough in their detection of facial AUs to fully 
substitute for expert coders when describing facial 
movements in everyday life. Nonetheless, these algo¬ 
rithms offer a distinct practical advantage because they 
can be used in conjunction with human coders to speed 
up the study of facial configurations in millions of 
images in the wild. It is likely that automated methods 
will continue to improve as better and more robust 
algorithms are developed and as more diverse face 
images become available. 

Measuring an emotional state. Once an approach has 
been chosen for measuring facial movements, a clear test 
of the common view of emotional expressions depends 
on having valid measures that reliably and specifically 
characterize, in a generalizable way, the instances of each 
emotion category to which the measurements of facial 
muscle movements can be compared. The methods that 
scientists use to assess people’s emotional states vary in 
their dependence on human inference, however, which 
raises questions about the validity of the measures. 

Relatively objective measures of an emotional instance. 
The more objective end of the measurement spectrum 
includes assessing emotions with dynamic changes in 
the autonomic nervous system (ANS), such as cardiovas¬ 
cular, respiratory, or perspiration changes (measured as 
variations in skin conductance), and dynamic changes 
in the central nervous system, such as changes in blood 
flow or electrical activity in the brain. These measures 
are thought to be more objective because the measure¬ 
ments themselves (assigning the numbers) do not require 
a human judgment (i.e., the measurements are perceiver- 
independent). Only the interpretation of the measure¬ 
ments (their psychological meaning) requires human 
inference. For example, a human observer does not judge 
whether skin conductance or neural activity increases or 
decreases; human judgment comes into play when the 
measurements are interpreted for the emotional meaning. 


Currently, there are no objective measures, either 
singly or as a pattern, that reliably, uniquely, and rep- 
licably identify an instance of one emotion category 
compared with an instance of another. Statistical sum¬ 
maries of hundreds of experiments (i.e., meta-analyses) 
show, for example, that currently there is no reliable 
relationship between an emotion category, such as 
anger, and a specific set of physical changes in the ANS 
that accompany the instances of that category, even 
probabilistically (the most comprehensive study pub¬ 
lished to date is Siegel et al., 2018, but for earlier stud¬ 
ies, see Cacioppo, Berntson, Larsen, Poehlmann, & Ito, 
2000; Stemmier, 2004; also see Box 8 in the Supplemental 
Material). In anger, for example, skin conductance can go 
up, go down, or stay the same (i.e., changes in skin con¬ 
ductance are not consistently associated with anger). And 
a rise in skin conductance is not unique to instances of 
anger; it also can occur during a range of other emotional 
episodes (i.e., changes in skin conductance do not specifi¬ 
cally occur in anger and only in anger). 

Individual studies often report patterns of ANS mea¬ 
sures that distinguish an instance of one emotion cat¬ 
egory from another, but those patterns are not replicable 
across studies and instead vary across studies, even 
when studies (a) use the same methods and stimuli and 
(b) sample from the same population of participants 
(e.g., compare findings from Kragel & LaBar, 2013, with 
those from Stephens, Christie, & Friedman, 2010). Simi¬ 
lar within-category variation is routinely observed for 
changes in neural activity measured with brain imaging 
(Lindquist, Wager, Kober, Bliss-Moreau, & Barrett, 2012) 
and single-neuron recordings (Guillory & Bujarski, 
2014). For example, pattern-classification studies dis¬ 
cover multivariate patterns of activity across the brain 
for emotion categories such as anger, sadness, fear, and 
so on, but these patterns are not replicable from study 
to study (e.g., compare Kragel & LaBar, 2015; Saarimaki 
et al., 2016; Wager et al., 2015; for a discussion, see 
Clark-Polner, Johnson, & Barrett, 2017). This observed 
variation does not imply that biological variability dur¬ 
ing emotional episodes is random; rather, it may be 
context-dependent (e.g., the yellow and green zones 
of Fig. 1). It may also be the case that current biological 
measures are simply insufficiently sensitive or compre¬ 
hensive to capture situated variation in a precise way. 
If this is so, then such variation should be considered 
unexplained rather than random. 

It is worth pointing out the difficult circularity built 
into these studies that we encounter again a few para¬ 
graphs down: Scientists must use some criterion for 
identifying when instances of an emotion category are 
present in the first place (so as to draw conclusions 
about whether emotion categories can be distinguished 
by different patterns of physical measurements).^^ In 
most studies that attempt to find bodily or neural 
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“signatures” of emotions, the criterion is subjective—it 
is either reported by the participants or provided by 
the scientist—which introduces problems of its own, 
as we discuss in the next section. 

Subjective measures of an emotional instance. With¬ 
out objective measures to identify the emotional state 
of a participant, scientists typically rely on the relatively 
more subjective measures that anchor the other end of 
the measurement spectrum. The subjective judgments 
can come from the participants (who complete self-report 
measures), from other observers (who infer emotion in the 
participants), or from the scientists themselves (who use a 
variety of criteria, including common sense, to infer the 
presence of an emotional episode). These are all exam¬ 
ples of perceiver-dependent measurements because 
the measurements themselves, as well as their interpreta¬ 
tion, rely directly on human inference. 

Scientists often rely on their own judgments and 
intuitions (as Charles Darwin did) to stipulate when an 
emotion is present or absent in participants. For exam¬ 
ple, snakes and spiders are said to evoke fear. So are 
situations that involve escaping from a predator. Some¬ 
times scientists stipulate that certain actions indicate 
the presence of fear, such as freezing or fleeing or even 
attacking in defense. The validity of the conclusions 
that scientists draw about emotions depends on the 
validity of their initial assumptions.'* 

Inferences about emotional episodes can also come 
from other people—for example, independent samples 
of study participants, who categorize the situations in 
which facial movements are observed. Scientists can 
also ask observers to infer when participants are emo¬ 
tional by having them judge subjects’ behavior or tone 
of voice (e.g., see our later discussion of Camras et ah, 
2007, in the section on infants and children). 

Another common strategy for identifying the emo¬ 
tional state of participants is simply to ask them what 
they are experiencing. Their self-reports of emotional 
experience then become the criteria for deciding 
whether an emotional episode is present or absent. 
Self-reports are often considered imperfect measures 
of emotion because they depend on subjective judg¬ 
ments and beliefs and require translation into words. 
In addition, people can experience an emotional event 
yet be unaware of it (i.e., conscious with no self- 
awareness) or unable to express emotion with words 
(a condition called alexithymia) and therefore unable 
to report on it. Despite questions about their validity, 
self-reports are the most common measure of emotion 
that scientists compare with facial AUs. 

Eluman inference and assessing the presence of an 
emotional state. At this point, it should be obvious that 
any measure of an emotional state itself requires some 


degree of human inference; what varies is the amount 
of inference that is required. Herein lies a problem: To 
properly test the hypothesis that certain facial move¬ 
ments reliably and specifically express emotion, scien¬ 
tists (ironically) must first make a reverse inference that 
an emotional event is occurring—that is, they infer the 
emotional instance by observing changes in the body, 
brain, and behavior. Or they infer (a reverse inference) 
that an event or object evokes an instance of a specific 
emotion category (e.g., an electric shock elicits fear but 
not irritation, curiosity, or uncertainty). These reverse 
inferences are scientifically sound only if measures of 
emotion reliably, specifically, and validly characterize the 
instances of the emotion category. So, any clear, scientific 
test of the common view of emotional expressions rests 
on a set of more basic inferences about whether an emo¬ 
tional episode is present or absent, and any conclusions 
that come from such a test are only as sound as those 
basic inferences. (It is, of course, also possible simply to 
stipulate the emotion: For instance, a researcher could 
choose to define fear as the set of internal states caused 
by electric shock, an approach that becomes tautological 
if not further constrained.) 

If all measures of emotion rest on human judgment 
to some degree, then, in principle, a scientist cannot 
be sure that an emotional state is present independently 
of that judgment, which in turn limits the observer- 
independent validity of any experiment designed to test 
whether a facial configuration validly expresses a spe¬ 
cific emotion category. All face-emotion associations 
that are observed in an experiment reflect human 
consensus —that is, the degree of agreement between 
self-judgments (from the participants), expert judg¬ 
ments (from the scientist), and/or judgments from other 
observers (perceivers who are asked to infer emotion 
in the participants). These types of agreement are often 
referred to as accuracy, but this may or not be valid. 
We touch on this point again when we discuss studies 
that test whether certain facial configurations are rou¬ 
tinely perceived as expressions of specific emotion 
categories. 

Testing the common view of emotional expressions; 
interpreting the scientific observations. If a specific 
facial configuration reliably expresses instances of a cer¬ 
tain emotion category in any given experiment, then we 
would expect measurements of the face (e.g., facial AU 
codes) to co-occur with other measures that indicate that 
participants are in the target emotional state. In principle, 
those measures might be more objective (e.g., ANS 
changes during an emotional event) or they might be 
more subjective (e.g., ratings provided by the participants 
themselves). In practice, however, the vast majority of 
experiments compare facial movements with subjective 
measures of emotion—a scientist’s judgment about which 
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emotions are likely to be evoked by a particular stimulus, 
the judgments of other human observers about partici¬ 
pants’ emotional states, or participants’ self-reports of 
emotional experience. For example, in an experiment, 
scientists might ask questions like these: Do the AUs that 
create a scowling facial configuration co-occur with self- 
reports of feeling angry? Do the AUs that create a pouting 
facial configuration co-occur with perceiver’s judgments 
that participants are sad? Do the AUs that create a wide- 
eyed, gasping facial configuration co-occur when people 
are exposed to an electric shock? If such observations 
suggest that a configuration of muscle movements is reli¬ 
ably observed during episodes of a given emotion cate¬ 
gory, then those movements are said to express the 
emotion in question. As we will see, many studies show 
that some facial configurations occur more often than 
random chance would allow but are not observed with a 
high degree of reliability (according to the criteria from 
Haidt and Keltner, 1999, explained in Table 2 of the cur¬ 
rent article). 

If a facial configuration specifically (i.e., uniquely) 
expresses instances of a certain emotion category in 
any given experiment, then we would expect to observe 
little co-occurrence between measurements of the face 
and measurements indicating the presence of emotional 
instances from other categories (again, see Table 2 and 
Fig. 3). 

If a configuration of facial movements is observed 
in instances of a certain emotion category in a reliable, 
specific way within an experiment, so that we can infer 
that the movements are expressing an instance of the 
emotion in that study as hypothesized, then scientists 
can safely infer that the facial movements in question 
are an expression of that emotion category’s instances 
in that situation. One more step is required before we 
can infer that the facial configuration is the expression 
of that emotion: We must observe a similar pattern of 
facial configuration-emotion co-occurrences across dif¬ 
ferent experiments, to some extent generalizing across 
the specific measures and methods used and the par¬ 
ticipants and contexts sampled. If the facial configuration- 
emotion co-occurrences replicate across experiments 
that sample people from the same culture, then the 
facial configuration in question can reasonably be 
referred to as an emotional expression only in that 
culture; for example, if a scowling facial configuration 
co-occurs with measures of anger (and only anger) 
across most studies conducted on adult participants in 
the United States who are free from illness, then it is 
reasonable to refer to a scowl as an expression of 
anger in healthy adults in the United States. If facial 
configuration-emotion co-occurrences generalize across 
cultures—that is, if they are replicated across experi¬ 
ments that sample a variety of instances of that emotion 


category in people from different cultures—then the 
facial configuration in question can be said to univer¬ 
sally express the emotion category in question. 

Studies of healthy adults from the United 
States and other developed nations 

We now review the scientific evidence from studies that 
document how people spontaneously move their facial 
muscles during instances of anger, disgust, fear, happi¬ 
ness, sadness, and surprise, as well as how they pose 
their faces when asked to indicate how they express 
each emotion category. We examine evidence gathered 
in the lab and in naturalistic settings, sampling healthy 
adults who live in a variety of cultural contexts. To 
evaluate the reliability, specificity, and generalizability 
of the scientific findings, we adapted criteria set out by 
Haidt and Keltner (1999), as discussed in Table 2. 

Spontaneous facial movements in laboratory stud¬ 
ies. A meta-analysis was recently conducted to test the 
hypothesis that the facial configurations in Figure 4 co¬ 
occur, as hypothesized, with the instances of specific 
emotion categories (Duran, Reisenzein, & Fernandez- 
Dols, 2017). Thirty-seven published articles reported on 
how people moved their faces when exposed to objects 
or events that evoke emotion. Most studies included in 
the meta-analysis were conducted in the laboratory. The 
findings from these experiments were statistically sum¬ 
marized to assess the reliability of facial movements as 
expressions of emotion (see Fig. 6). In all emotion cate¬ 
gories tested, other than fear, participants moved their 
facial muscles into the expected configuration more reli¬ 
ably than what we would expect by chance. Reliability 
levels were weak, however, indicating that the proposed 
facial configurations in Figure 4 have limited reliability 
(and to some extent, limited generalizability; i.e., a scowl¬ 
ing facial configuration is an expression of anger, but not 
the expression of anger). More often than not, people 
moved their faces in ways that were not consistent with 
the hypotheses of the common view. An expanded ver¬ 
sion of this meta-analysis (Duran & Fernandez-Dols, 
2018) analyzed 131 effect sizes from 76 studies totaling 
4,487 participants, with similar results: The hypothesized 
facial configurations were observed with average effect 
sizes (r) of .31 for the correlation between the intensity 
of a facial configuration and a measure of anger, disgust, 
fear, happiness, sadness, or surprise (corresponding to 
weak evidence of reliability; individual correlations for 
specific emotion categories ranged from .06 to .45, inter¬ 
preted as no evidence of reliability to moderate evi¬ 
dence of reliability). The average proportion of the times 
that a facial configuration was observed during an 
emotional event (in one of those categories) was .22 
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Fig. 6. Meta-analysis of facial movements during emotional episodes: a summary of effect sizes across studies (Duran, 
Reisenzein, & Fernandez-Dols, 2017). Effect sizes are computed as correlations or proportions (as reported in the original 
experiments). Results include experiments that reported a correspondence between a facial configuration and its hypoth¬ 
esized emotion category as well as those that reported a correspondence between individual AUs of that facial configuration 
and the relevant emotion category; meta-analytic effect sizes that summarized only the effects for entire ensembles of AUs 
(the facial configurations specified in Fig. 4) were even lower than those reported here. 


(proportions for specific emotion categories ranged from 
.11 to .35, interpreted as no evidence to weak evidence 
of reliability). 15 

No overall assessment of specificity was reported in 
either the original or the expanded meta-analysis because 
most published studies do not report the false-positive 
rate (i.e., the frequency with which a facial AU is 
observed when an instance of the hypothesized emotion 
category was not present; see Fig. 3). Nonetheless, some 
striking examples of specificity failures have been docu¬ 
mented in the scientific literature. For example, a certain 
smile, called a Duchenne smile, is defined in terms of 
facial muscle contractions (i.e., in terms of facial mor¬ 
phology): It involves movement of the orbicularis oculi, 
which raises the cheeks and causes wrinkles at the outer 
corners of the eyes, in addition to movement of the 
zygomaticus major, which raises the corners of the lips 
into a smile. A Duchenne smile is thought to be a spon¬ 
taneous expression of authentic happiness. Research 
shows, however, that a Duchenne smile can be intention¬ 
ally produced when people are not happy (Gunnery & 
Flail, 2014; Gunnery, Flail, & Ruben, 2013; also see 


Krumhuber & Manstead, 2009), consistent with evidence 
that Duchenne smiles often occur when people are sig¬ 
naling submission or affiliation rather than reflecting 
happiness (Rychlowska et al., 2017). 

Spontaneous facial movements in naturalistic set¬ 
tings. Studies of facial configuration-emotion category 
associations in naturalistic settings tend to yield results 
similar to those from studies that were conducted in more 
controlled laboratory settings (Fernandez-Dols, 2017; 
Fernandez-Dols & Crivelli, 2013). Some studies observe 
that people express emotions in real-world settings by 
spontaneously making the facial muscle movements pro¬ 
posed in Figure 4, but such observations are generally not 
replicable across studies (e.g., cf Matsumoto & Willingham, 
2006 and Crivelli, Carrera, & Fernandez-Dols, 2015; cf 
Rosenberg & Ekman, 1994 and Fernandez-Dols, Sanchez, 
Carrera, & Ruiz-Belda, 1997). For example, two field 
studies of winning judo fighters recently demonstrated 
that so-called Duchenne smiles were better predicted by 
whether an athlete was interacting with an audience than 
the degree of happiness reported after winning their 
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matches (Crivelli et al., 2015). Only 8 of the 55 winning 
fighters produced a Duchenne smile in Study 1; all occurred 
during a social interaction. Only 25 of 119 winning fighters 
produced a Duchenne smile in Study 2, documenting, at 
best, weak evidence for reliability. 

Posed facial movements. Another source of evidence 
comes from asking participants sampled from various 
cultures to deliberately pose the facial configurations that 
they believe they use to express emotions. In these stud¬ 
ies, participants are given a single emotion word or a 
single, brief statement to describe each emotion category 
and are then asked to freely pose the facial configuration 
that they believe they make when expressing that emo¬ 
tion. Such research directly examines common beliefs 
about emotional expressions. For example, one study 
provided college students from Canada and Gabon (in 
Central Africa) with dictionary definitions for 10 emotion 
categories. After practicing in front of a mirror, partici¬ 
pants posed the facial configurations so that “their friends 
would be able to understand easily what they feel” 
(Elfenbein, Beaupre, Levesque, & Hess, 2007, p. 134) and 
their poses were FACS coded. Likewise, a recent study 
asked college students in China, India, Japan, Korea, and 
the United States to pose the facial movements they 
believe they make when expressing each of 22 emotion 
categories (Cordaro et al., 2018). Participants heard a 
brief scenario describing an event that might cause anger 
(“You have been insulted, and you are very angry about 
it”) and then were instructed to pose a facial (and non¬ 
verbal but vocal) expression of emotion, as if the events 
in the scenario were happening to them. Experimenters 
were present in the testing room as participants posed 
their responses. Both studies found moderate to strong 
evidence that participants across cultures share common 
beliefs about the expressive pose for anger, fear, and sur¬ 
prise categories; there was weak to moderate evidence 
for the happiness category, and weak evidence for the 
disgust and sadness categories (Fig. 7). Cultural variation 
in participants’ beliefs about emotional expressions was 
also observed. 

Neither study compared participants’ posed expres¬ 
sions (their beliefs about how they move their facial 
muscles to express emotions) with observations of how 
they actually moved their faces when expressing emo¬ 
tion. Nonetheless, a quick comparison of the findings 
from the two studies and the proportions of spontane¬ 
ous facial movements made during emotional events 
(from the Duran et al., 2017 meta-analysis) makes it 
clear that posed and spontaneous movements differ, 
sometimes quite substantially (again, see Fig. 7). When 
people pose a facial configuration that they believe 
expresses an emotion category, they make facial move¬ 
ments that more reliably agree with the hypothesized 


facial configurations in Figure 4. The same cannot be 
said of people’s spontaneous facial movements during 
actual emotional episodes, however (for convergent evi¬ 
dence, see Motley & Camden, 1988; Namba, Makihara, 
Kabir, Miyatani, & Nakao, 2016). One possible interpre¬ 
tation of these findings is that posed and spontaneous 
facial-muscle configurations correspond to distinct com¬ 
munication systems. Indeed, there is some evidence 
that volitional and involuntary facial movements are 
controlled by different neural circuits (Rinn, 1984). 
Another factor that may contribute to the discrepancy 
between posed and spontaneous facial movements is 
that people’s beliefs about their own behavior often 
reflect their stereotypes and do not necessarily cor¬ 
respond to how they actually behave in real life (see 
Robinson & Clore, 2002). Indeed, if people’s beliefs, as 
measured by their facial poses, are influenced directly 
by the common view, then any observed relationship 
between posed facial expressions and hypothesized 
emotion categories is merely evidence of the beliefs 
themselves. 

Summary. Our review of the available evidence thus 
far is summarized in the first through third data rows in 
Table 3. The hypothesized facial configurations presented 
in Figure 4 spontaneously occur with weak reliability 
during instances of the predicted emotion category, sug¬ 
gesting that they sometimes serve to express the pre¬ 
dicted emotion. Furthermore, the specificity of each 
facial configuration as an expression of an emotion cat¬ 
egory is largely unknown (because it is typically not 
reported in many studies). In our view, this pattern of 
findings is most compatible with the interpretation that 
hypothesized facial configurations are not observed reli¬ 
ably or specifically enough to justify using them to infer 
a person’s emotional state, whether in the lab or in every¬ 
day life. We are not suggesting that facial movements are 
meaningless and devoid of information. Instead, the data 
suggest that the meaning of any set of facial movements 
may be much more variable and context-dependent than 
hypothesized by the common view. 

Studies of healthy adults living in 
small-scale, remote cultures 

The emotion categories that are at the heart of the com¬ 
mon view—anger, disgust, fear, happiness, sadness, and 
surprise—derive from modern U.S. English (Wierzbicka, 
2014), and their proposed expressions (in Fig. 4) derive 
from observations of people who live in urbanized. 
Western settings. Nonetheless, it is hypothesized that 
these facial configurations evolved as emotion-specific 
expressions to signal socially relevant emotional infor¬ 
mation (Shariff & Tracy, 2011) in the challenging 
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Fig. 7. Comparing posed and spontaneous facial movements. Correlations or proportions are presented for anger, 
disgust, fear, happiness, sadness, and surprise, separately for three studies. Data are from Table 6 in Cordaro et al. 
(2018), from Elfenbein, Beaupre, Levesque, and Hess (2007; reliability for the anger category is for AU4 + AU5 only), 
and from Duran, Reisenzein, and Fernandez-Dols (2017; proportion data only). 


situations that originated in our hunting-and-gathering 
hominin ancestors who lived on the African savannah 
during the Pleistocene era (Pinker, 1997; Tooby & 
Cosmides, 1990). It is further hypothesized that these 
facial configurations should therefore be observed dur¬ 
ing instances of the predicted emotion categories with 
strong reliability and specificity in people around the 
world, although the facial movements might be slightly 
modified by culture (Cordaro et al., 2018; Ekman, 1972). 
The strongest test of these hypotheses would be to 
sample participants who live in remote parts of the 
world with relatively little exposure to Western cultural 
norms, practices, and values (Henrich et ah, 2010; 
Norenzayan & Heine, 2005) and observe their facial 
movements during emotional episodes.^® In our evalu¬ 
ation of the evidence, we continued to use the criteria 
summarized by Haidt and Keltner (1999; see Table 2 in 
the current article). 

Spontaneous facial movements in naturalistic set¬ 
tings. Our review of scientific studies that systematically 
measure the spontaneous facial movements in people of 
small-scale, remote cultures is brief by necessity: There 
are no such studies. At the time of publication, we were 


unable to identify even a single published report or man¬ 
uscript registered on open-access, preprint services that 
measured facial muscle movements in people of remote 
cultures as they experienced emotional events. Scientists 
have almost exclusively observed how people from re¬ 
mote cultures label facial configurations as emotional 
expressions (i.e., studying emotion perception, not pro¬ 
duction) to test the hypothesis that certain facial configu¬ 
rations evolved to express certain emotion categories in a 
reliable, specific, and generalizable (i.e., universal) man¬ 
ner. Later in this article, we return to this issue and discuss 
the findings from these emotion-perception studies. 

There are nonetheless several descriptive reports that 
provide support for the common view of universal emo¬ 
tional expressions (similar to what Valente, Theurel, & 
Gentaz, 2018, refer to as an “observational approach”). 
For example, the U.S. psychologist Paul Ekman and 
colleagues curated an archive of photographs of the 
Fore hunter-gatherers taken during his visits to Papua 
New Guinea in the 1960s (Ekman, 1980). The photo¬ 
graphs were taken as people went about their daily 
activities in the small hamlets of the eastern highlands 
of Papua New Guinea. Ekman used his knowledge of 
the situation in which each photograph was taken to 
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Table 3. Reliability and Specificity: A Summary of the Evidence 


Study type 

Reliability 

Specificity 

Expression production 



Adults, developed, spontaneous, lab 

Weak 

Unknown 

Adults, developed, spontaneous, naturalistic 

Weak 

Unknown 

Adults, developed, posed 

Weak to strong 

Unknown 

Adults, remote, spontaneous 

Unknown 

Unknown 

Adults, remote, posed 

Weak to strong 

Unknown 

Newborns, infants, toddlers 

Un.supported 

Unsupported 

Congenitally blind 

Un.supported to weak 

Unsupported 

Emotion perception 



Adults, developed, choice-from-array 

Moderate to strong 

Unknown 

Adults, developed, reverse correlation (with choice-from-array) 

Moderate 

Moderate 

Adults, developed, free labeling 

Weak to moderate 

Weak 

Adults, developed, virtual humans 

Unknown 

Unknown 

Adults, remote, choice-from-array (before 2008) 

Moderate to strong 

Unknown 

Adults, remote, choice-from-array (after 2008) 

Weak to moderate 

Unsupported 

Adults, remote, free labeling (before 2008) 

Un.supported to strong 

Variable 

Adults, remote, free labeling (after 2008) 

Un.supported 

Unsupported 

Infants, young children 

Un.supported 

Unsupported 


Note. Criteria were adopted from Haidt and Keltner (1999), who suggest that reliability rates of 70% to 90% are considered 
strong evidence for universal emotion perception (following Ekman, 1994); presumably, this would also hold for studies 
of expression production. Weak evidence is in the range of 20% to 40% (Haidt & Keltner, 1999, citing Russell, 1994). By 
interpolation, reliability between 41% and 69% would be considered moderate evidence. Reliability estimates below 20% 
are interpreted as findings that clearly do not support the reliability hypothesis. We also adopted these criteria for specificity 
findings. Developed = studies of participants from the U.S. and other more urban countries; spontaneous = spontaneous 
facial movements; posed = posed facial configurations; remote = studies of participants from small-scale, remote samples. 


assign each facial configuration to an emotion category, 
leading him to conclude that the Fore expressed emo¬ 
tions with the proposed facial configurations shown in 
Figure 4. Yet different scientific methods yielded a con¬ 
trasting conclusion. When Trobriand Islanders living in 
Papua New Guinea were asked to infer emotions in 
facial configurations by labeling these same photo¬ 
graphs in their native language, both by freely offering 
words and by choosing the best fitting emotion word 
from a list of nine choices, they did not label the facial 
configurations as proposed by Ekman and colleagues, 
at above-chance levels (Crivelli, Russell, Jarillo, & 
Fernandez-Dols, 2017).^^ In fact, the proposed fear 
expression—the wide-eyed, gasping face—is reliably 
interpreted as an expression of threat (intent to harm) 
and anger by the Maori of New Zealand and by the 
Trobriand Islanders in Papua New Guinea (Crivelli, 
Jarillo & Fridlund, 2016). 

A compendium of spontaneous human behavior 
published by the Austrian ethologist Irenaus Eibl- 
Eibesfeldt (1989) is sometimes cited as evidence for the 
hypothesis that certain facial movements are universal 
signals for specific emotion categories. No systematic 
coding procedure was used in his investigations, how¬ 
ever. On close examination, Eibl-Eibesfeldt’s detailed 
descriptions appear to be more consistent with results 


from the studies of people living in more industrialized 
cultures that we reviewed above: People move their 
faces in a variety of ways during episodes belonging 
to the same emotion category. Eor example, as reported 
by Eibl-Eibesfeldt, a rapid eyebrow raise (called an 
eyebrow flash) is thought to express friendly recogni¬ 
tion in some cultures but not all. Likewise, particular 
facial muscle movements are not specific expressions 
of a given emotion category. For example, an eyebrow 
flash would be coded with FACS AU 1 (inner brow 
raise) and AU 2 (outer brow raise), which are part of 
the proposed expressions for surprise and fear (Ekman, 
Levenson, & Friesen, 1983), sympathy (Haidt & Keltner, 
1999), and awe (Shiota, Campos, & Keltner, 2003). Even 
Eibl-Eibesfeldt acknowledged that eyebrow flashes 
were not unique expressions of specific emotion cat¬ 
egories, writing that they also served as a greeting, as 
an invitation for social contact, as a sign of thanks, as 
an initiation of flirting, and as a general indication of 
“yes” in Samoans and other Polynesians, in the Eipo 
and Trobriand islanders in Papua New Guinea, and in 
the Yanomami of South America. In Japan, eyebrow 
flashes are considered an impolite way for adults to 
greet one another. In the United States and Europe, an 
eyebrow flash was observed when greeting friends but 
not when greeting strangers. 
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Posed facial movements. In the only study of expres¬ 
sion production in a natural environment that we could 
find, researchers read a brief emotion story to people 
who live in the remote Fore culture of Papua New Guinea 
and asked each person to “show how his face would 
appear” (Ekman, 1972, p. 273) if he was the person 
described in the emotion story (sample size was not 
reported). Videotapes of 9 participants were shown to 34 
U.S. college students who were asked to judge which 
emotion was being expressed. U.S. participants were 
asked to infer the emotional meaning of the facial poses 
by choosing an emotion word from six choices provided 
by the experimenter (called a choice-from-array task; 
discussed on page 31 of this article). Participants inferred 
the intended emotional meaning at above-chance levels 
for smiling (happiness, 73%), frowning (sadness, 68%), 
scowling (anger, 51%), and nose-wrinkling (disgust, 46%), 
but not for surprise and fear (27% and 18%, respectively). 

Summary. Our review of the available evidence from 
expression-production studies in small-scale, remote cul¬ 
tures is inconclusive because there are no systematic, con¬ 
trolled observations that examine how people who live in 
these cultural contexts spontaneously move their facial 
muscles during emotional episodes. The evidence that 
does exist suggests that common beliefs about emotion 
may share some similarities across urban and small-scale 
cultural contexts, but more research is needed before any 
interpretations are warranted. These findings are summa¬ 
rized in the fourth and fifth data rows of Table 3- 

Studies of healthy infants and children 

The facial movements of infants and young children 
provide a valuable way to test common beliefs about 
emotional expressions because, unlike older children 
and adults, babies cannot exert voluntary control over 
their spontaneous expressive behaviors, meaning that 
they are unable to deliberately mask or portray instances 
of emotion in accordance with social demands. As a 
general rule, infants understand far more about the 
world than they can easily convey through their physi¬ 
cal actions, making it difficult for experiments to dis¬ 
tinguish between what infants understand and what 
they can do; the former often exceeds the latter (Poliak, 
2009). Experiments must use human inference to deter¬ 
mine when an infant is in an emotional state, as is the 
case in studies of adults (see Eluman Inference and 
Assessing the Presence of an Emotional State, above). 
The presence (or absence) of an instance of emotion 
is inferred (i.e., stipulated), either by a scientist (who 
exposes a child to something that is presumed to evoke 
an emotion episode) or by adult “raters” who infer the 
emotional meaning of the evoking situation or of the 


child’s body movements and vocalizations (see Subjec¬ 
tive Measures of an Emotional Instance, above). In the 
latter cases, inferences are measured by asking research 
participants to label the situation or the child’s emo¬ 
tional state by choosing an emotion word or image from 
a small set of options, known as a choice-from-array 
task. We address the strengths and weaknesses of 
choice-from-array tasks (see Eig. 8) and the potential 
risk of confirmatory bias with the use of such methods 
(see A Note on Interpreting the Data, below). 

There is also a risk, given the strong reliance on human 
inference, that scientists will implicitly confound the mea¬ 
surements made in an experiment with their interpretation 
of those measurements, in effect overinterpreting infant 
behavior as emotional, in part because these young 
research participants cannot speak for themselves. Some 
early and influential studies confounded the observation 
of facial movements with their interpreted emotional 
meaning, leading to the conclusions that babies as young 
as 7 months old were capable of producing an expression 
of anger. In fact, it is more scientifically correct to say that 
the babies were scowling. Eor example, in one study, 
infants’ facial movements were coded as they were given 
a cookie, and then the cookie was taken away and placed 
out of reach, although it was still clearly visible. The 
babies appeared to scowl when the cookie was removed 
and not when it was in their mouths (Stenberg, Campos, 
& Emde, 1983). It is certainly possible that this repeated 
giving and taking away of the treat angered the infants, 
but the babies might also have been confused or just 
generally distressed. Without some independent evidence 
to indicate that a state of anger was induced, we cannot 
confidently conclude that certain facial movements in an 
infant reliably express a specific instance of emotion. 

The Stenberg et al. (1983) study illustrates some of 
the concerning design issues that have historically 
plagued studies with infants. Eirst, emotion-inducing 
situations are often defined with common-sense intu¬ 
itions rather than objective evidence (e.g., an infant is 
assumed to become angry when a cookie is taken 
away). In fact, it is difficult to know how any individual 
infant at any point in time will construct and react to 
such an event. Second, when an infant produces a facial 
movement, a common assumption is used to infer its 
emotional meaning without additional measures or con¬ 
trols (e.g., when a scowling facial configuration is 
observed, it is assumed to necessarily be an expression 
of infant anger, even if there are no data to confirm that 
a scowl is specific to instances of anger in an infant). 
In fact, years later, as their research program pro¬ 
gressed, Campos and his team revised their earlier inter¬ 
pretation of their findings, later concluding that the 
facial movements in question (infants lowering and 
drawing together their brows, staring straight ahead, or 
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Fig. 8. Culturally common facial configuration.s extracted using reverse correlation from 62 models of facial configurations. Red 
coloring indicates stronger action unit (AU) presence and blue indicates weaker AU presence. Some words and phrases that refer 
to emotion categories in Chinese are not considered emotion categories in English. Adapted with permission of the American Psy¬ 
chological Association, from Revealing Culturally Common Facial Expressions of Emotion, by Jack, R. E., Sun, W., Delis, I., Garrod, 
O. G., and Schyns, P. G., in the Journal of Experimental Psychology: General, Vol. 145. Copyright © 2016; permission conveyed through 
Copyright Clearance Center, Inc. 


pressing their lips together) were more generally associ¬ 
ated with unpleasantness and distress and were not 
reliable expressions of anger (e.g., Camras et al., 2007). 

The inference problem is particularly poignant when 
fetuses are studied. For example, in a 4-D ultrasonog¬ 
raphy study performed with fetuses at 20 gestational 
weeks, researchers observed the fetuses knitting their 
brows and described the facial movements as expres¬ 
sions of distress (Dondi et al., 2012). Yet the fetuses were 


producing these facial movements during situations in 
which fetal distress was unlikely. The brow-knitting was 
observed during noninvasive ultrasound scanning that 
did not involve perturbation of the fetus, and the preg¬ 
nant women were at rest. Furthermore, the scans were 
brief, and the facial movements were interspersed with 
other movements that are typically not thought to 
express negative emotions, such as smiling and mouth¬ 
ing. This is an example of making a scientific inference 
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about the presence of an emotion solely on the basis 
of the facial movements without converging evidence 
that the organism in question (a fetus) was in a distressed 
state. Doing so highlights the common but unsound 
assumption that certain facial movements reliably and spe¬ 
cifically index instances of the same emotion category. 

The study of expression production in infants and 
children must deal with other design challenges—in 
addition to the reliance on human inference—that are 
shared by experiments with adult participants. In par¬ 
ticular, most experiments observe facial movements in 
a restricted range of laboratory settings rather than in 
the wide variety of situations that naturally occur in 
everyday life. The frequent use of only a single stimulus 
or event to observe facial movements for each emotion 
category limits the opportunity to discover whether the 
expressions of an emotion category vary systematically 
with context. 

Even with these design considerations, the scientific 
findings from studies of infants and children parallel 
those that we encountered from studies on adults: Weak 
to no reliability and specificity in facial muscle move¬ 
ments is the norm, not the exception (again, using the 
criteria from Haidt & Keltner, 1999, that are presented in 
Table 2 of the current article). Although some older stud¬ 
ies concluded that infants produce invariant emotional 
expressions (e.g., Izard et ah, 1995; Izard, Hembree, 
Dougherty, & Spirrizi, 1983; Izard, Hembree, & Huebner, 
1987; Lewis, Ramsay, & Sullivan, 2006), these conclusions 
have been largely superseded by more recent work and 
in many cases have been reinterpreted and revised by 
the authors themselves (e.g., Lewis et ah, 2006). 

Facial movement in fetuses, infants, and young chil¬ 
dren. The most detailed research on facial movements 
in fetuses and newborns has focused on smiles. Human 
fetuses lower their brows (AU4), raise their cheeks (AU6), 
wrinkle their noses (AU9), crease their nasolabia (AUll), 
pull the corners of their lips (AU12), show their tongues 
(AU19), part their lips (AU25), and stretch their mouths 
(AU27)—all of which have been implicated, to some 
degree, in adult laughter. Infants sometimes produce 
facial movements that resemble adult laughter when other 
considerations suggest that they are in distress and pain 
(Dondi et al., 2012; Hata et al., 2013; Reissland, Francis, & 
Mason, 2013; Reissland, Francis, Mason, & Lincoln, 2011; 
Yan et al., 2006). Within 24 hr of birth, infants raise their 
cheek muscles in response to being touched (Cecchini, 
Baroni, Di Vito, & Lai, 2011). But these movements are not 
specific to smiling; neonates also raise their cheeks (con¬ 
tract the zygomatic muscle) during rapid eye movement 
(REM) sleep, when drowsy, and during active sleep (Dondi 
et al., 2007). A neonatal smile with raised cheeks is caused 
by brainstem activation (Rinn, 1984), and likely reflects 


internally generated arousal rather than expressing or com¬ 
municating an emotion or even a more general feeling of 
pleasure (Emde & Koenig, 1969; Sroufe, 1996; Wolff, 1987). 
So, it remains unclear whether fetal or neonatal facial mus¬ 
cle movements have any relationship to specific emotional 
episodes as well as more generally to pleasant feelings or 
to other social meanings (Messinger, 2002). 

In fact, it is not clear that fetal and neonatal facial 
movements always have a psychological meaning (con¬ 
sistent with a behavioral-ecology view of facial move¬ 
ments; Fridlund, 2017). Newborns appear to produce 
some combinations of facial movements for muscular 
reasons. For example, infants produce facial movements 
associated with the proposed expression for surprise 
(open mouth and raised eyebrows) in situations that 
are unsurprising, just because opening the mouth nec¬ 
essarily raises their eyebrows; conversely, infants do 
not consistently show the proposed expressive configu¬ 
ration for surprise in contexts that are likely to be 
surprising (Camras, 1992; Camras, Castro, Halberstadt, 
& Shuster, 2017). The facial movement that is part of 
the proposed expression for sadness (brows oblique 
and drawn together) occurs when infants attempt to lift 
their heads to direct their gaze (Michel, Camras, & 
Sullivan, 1992). 

In addition, newborns produce many facial move¬ 
ments that co-occur with fussiness, distress, focused 
attention, and distaste (Oster, 2005). Newborns react to 
being given sweet versus sour liquids; for example, when 
given a sour liquid, newborns make a nose-wrinkle 
movement, which is part of the proposed expressive 
configuration for disgust (Granchrow, Steiner, & Daher, 
1983). However, other studies show that newborns also 
make this facial movement when given sweet, salty, and 
bitter tastes (e.g., Rosenstein & Oster, 1988). Still other 
studies show that nose-wrinkling does not always occur 
when infants taste lemon juice (i.e., when that facial 
movement is expected; Bennett, Bendersky, & Lewis, 
2002). More generally, infants rarely produce consistent 
facial movements that cleanly map onto any single emo¬ 
tion category. Instead, infants produce a variety of facial 
configurations that suggest a lack of emotional specific¬ 
ity (Matias & Cohn, 1993). 

There are further examples that illustrate how infant 
facial movements lack strong reliability and specificity. 
In a study of 11-month-old babies from the United 
States, China, and Japan, infants saw a toy gorilla head 
that growled (to induce fear) or their arms were 
restrained (to induce anger; Camras et al., 2007). Observ¬ 
ers judged the infants to be fearful or angry on the basis 
of their body movements, yet the infants produced the 
same facial movements in the two situations.In another 
study, 1-year-old infants were videotaped in situations 
in which they were tickled (to elicit joy), tasted sour 
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flavors (to elicit disgust), watched a jack-in-the box (to 
elicit surprise), had an arm restrained (to elicit anger), 
and were approached by a masked stranger (to elicit 
fear; Bennett et ah, 2002). Infants whose arms were 
restrained (to purportedly induce an instance of anger) 
produced the facial actions associated with the pro¬ 
posed facial configuration for an anger expression only 
24% of the time (low reliability); instead, 80 infants 
(54%) produced the facial actions proposed as the 
expression of surprise, 37 infants (25%) produced the 
facial actions proposed as the expression of joy, 29 
infants (19%) produced the facial actions proposed as 
the expression of fear, and 28 (18%) produced the facial 
actions proposed as the expression of sadness. This 
dramatic lack of specificity was observed for all emotion 
categories studied. An equal number of babies produced 
facial movements that are proposed as the expressions 
of joy, surprise, anger, disgust, and fear categories when 
a sour liquid was placed on infants’ tongues to elicit 
disgust. When infants faced a masked stranger, only 20 
(13%) produced facial movements that corresponded to 
the proposed expression for fear, compared with 56 
infants (37%) who produced facial actions associated 
with the proposed expression for instances of joy.^^ 

Taken together, these findings suggest that infant 
facial movements may be associated with affect (i.e., 
the affective features of experience, such as distress or 
arousal), as originally described by Bridges (1932), or 
may communicate a desire to approach or avoid some¬ 
thing (e.g., Lewis, Sullivan, & Kim, 2015). Affective fea¬ 
tures such as valence (ranging from pleasantness to 
distress) and arousal (ranging from activated to quies¬ 
cent) are continuous properties of experience, just as 
approach/avoidance is an affective property of action. 
These affective features are shared by many instances 
of different emotion categories, as well as with mental 
events that are not considered emotional (as discussed 
in Box 9 in the Supplemental Material), but this does 
not diminish their importance or effectiveness for 
infants.^"* Over time, infants likely learn to differentiate 
mental events with simple affective features into epi¬ 
sodes of emotion with additional psychological features 
that are specific to their sociocultural contexts, making 
them maximally effective at eliciting needed responses 
from their caregivers (L. F. Barrett, 2017b; Holodynski & 
Friedlmeier, 2006; Weiss & Nurcombe, 1992; Witherington, 
Campos, & Hertenstein, 2001). 

The affective meaning of an infant’s facial move¬ 
ments may, in fact, be what makes these movements 
so salient for adult observers. When infants move their 
lips, open their mouths, or constrict their eyes, adults 
view infants as feeling more pleasant or unpleasant 
depending on the context (Bolzani Dinehart et ah, 
2005). Infant expressions thus do have a reliable link 


to instrumental effects in the adults who observe 
them—playing an important role in parent-infant inter¬ 
action, attachment, and the beginnings of social com¬ 
munication (Atzil, Gao, Fradkin, & Barrett, 2018; 
Feldman, 2016). For example, if an infant cries with 
narrowed eyes, adults infer that the infant is feeling 
negative, is having an unwanted experience, or is in 
need of help, but if the infant makes that same eye 
movement while smiling, adults infer that the infant is 
experiencing more positive emotion. These data con¬ 
sistently point to the usefulness of facial movements in 
the communication of arousal and valence, particularly 
when combined or with other communicative features 
such as vocalizations (properties of affect; see Box 9 
in the Supplemental Material). Even when episodes of 
more specific emotions start to emerge, we do not yet 
have evidence that facial movements map reliably and 
regularly to a specific emotion category. 

Young children begin to produce adult-like facial 
configurations after the first year of life. Even then, 
however, children’s facial movements continue to lack 
strong reliability and specificity (Bennett et al., 2002; 
Camras & Shutter, 2010; Matias & Cohn, 1993; Oster, 
2005). Examples of a wide-eyed, gasping facial configu¬ 
ration, proposed as the expression of fear (see Eig. 4), 
have rarely been observed or reported in young infants 
(Witherington, Campos, Flarriger, Bryan, & Margett, 
2010). Nor do infants reliably produce a scowling facial 
configuration, proposed as the expression of anger 
(again, see Eig. 4). Infants scowl when they cry or are 
about to cry (Camras, Eatani, Eraumeni, & Shuster, 
2016). A frown (mouth corner depression, AU15) is not 
reliably and specifically observed when infants are frus¬ 
trated (Lewis & Sullivan, 2014; Sullivan and Lewis, 
2003). A smile (cheek raising and lip corner pulling, 
AU6 and AU12) is not reliably observed when infants 
are in visually engaging or mastery situations, or even 
when they are in pleasant social interactions (Messinger, 
2002 ). 

Experiments that observe young children’s facial 
movements in naturalistic settings find largely the same 
results as those conducted in controlled laboratory set¬ 
tings. For example, one study trained ethnographic 
videographers to record a family’s daily activities over 
4 days (Sears, Repetti, Reynolds, & Sperling, 2014). Cod¬ 
ers judged whether or not the child from each partici¬ 
pating family made a scowling facial configuration 
(referred to as an expression of anger), a frowning 
facial configuration (referred to as an expression of 
sadness), and so on, for the six (presumed) emotion 
categories included in the study—happiness, sadness, 
surprise, disgust, fear, and anger. During instances that 
were coded as anger (defined as situations that included 
verbal disagreements or sibling bickering, requests for 
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compliance and/or reprimands from parents, parent 
refusal of child requests, during homework, and sibling 
provocation), a variety of facial movements were 
observed, including frowns, furrowed brows, and eye- 
rolls, as well as a variety of vocalizations, including 
shouts and whining, and both nonaggressive and 
aggressive physical behaviors. 

Perhaps the most telling observation for our pur¬ 
poses is that expressions of anger were more often 
vocal than facial. During anger situations, children 
raised their voices 42% of the time, followed by whining 
about 21% of the time. By contrast, children made 
scowling facial configurations only 16.2% of the time.^^ 
Yet even during anger situations, the facial movements 
were predominantly frowning, which can be part of 
many different proposed facial configurations. The 
authors reasoned that children engage in specific 
behaviors to obtain specific goals, and that behaviors 
such as whining are more likely to attract attention and 
possibly change parental behavior than is a facial move¬ 
ment. Indeed, it is easier for parents to ignore a nega¬ 
tive facial expression than a whining child in the room! 
Similar findings for low reliability and specificity of the 
facial configurations presented in Figure 4 were recently 
observed in a naturalistic study that videotaped 7- to 
9-year-old children and their mothers discussing a con¬ 
flict during their visit to the laboratory related to home¬ 
work, chores, bedtime, or interactions with siblings 
(Castro, Camras, Halberstadt, & Shuster, 2018). 

Summary. Newborns and infants react to the world 
around them with facial movements. There is not yet suf¬ 
ficient evidence, however, to conclude that these facial 
movements reliably and specifically express the instances 
of any emotion category (findings are summarized in the 
sixth data row of Table 3). When considered alongside 
vocalizations and body movements, there is consistent 
evidence that infant facial movements reliably signal dis¬ 
tress, interest, and arousal and perhaps serve as a call for 
help and comfort. In young children, instances of the 
same emotion category appear to be expressed with a 
variety of different muscle movements, and the same 
muscle movements occur during instances of various 
emotion categories, and even during nonemotional 
instances. It may be the case that reliability and specificity 
emerges through learning and development (see Box 10 
in the Supplemental Material), but this remains an open 
question that awaits future research. 

Studies of congenitally blind individuals 

Another source of evidence to test the common view 
comes from observations of facial movements in people 
who were born blind. The assumption is that people 


who are blind cannot learn by watching others which 
facial muscles to move when expressing emotion. On 
the basis of this assumption, several studies have 
claimed to find evidence that congenitally blind indi¬ 
viduals express emotions with the hypothesized facial 
configurations in Figure 4 (e.g., blind athletes were 
reported to show expressions that are reliably inter¬ 
preted as shame and pride; Tracy & Matsumoto, 2008; 
see also Matsumoto & Willingham, 2009). People who 
are born blind learn through other sensory modali¬ 
ties, however (for a review, see Bedny & Saxe, 2012), 
and therefore can learn whatever regularities exist 
between emotional states and facial movements from 
hearing descriptions in conversation, in books and 
movies, and by direct instruction.^® As an example of 
such learning, Olympic athletes who won medals 
smiled only when they knew they were watched by 
other people, such as when they were on the podium 
facing the audience; in other situations, such as while 
they waited behind the podium or while they were on 
the podium facing away from people but toward a flag, 
they did not smile (but presumably were still very 
happy; Fernandez-Dols & Ruiz-Belda, 1995). Such find¬ 
ings are consistent with the behavioral ecology view of 
facial expressions (Fridlund, 2017, 1991) and with more 
recent sociological evidence that smiles are social cues 
that can communicate different social messages depend¬ 
ing on the cultural context (J. Martin, Rychlowska, 
Wood, & Niedenthal, 2017). 

The limitations that apply to studies of emotional 
expressions in sighted individuals, reviewed throughout 
this article, are even more applicable to scientific stud¬ 
ies of emotional expressions in the blind.Participants 
are given predetermined emotion categories that con¬ 
strain their possible responses, and facial movements 
are often quantified by human judges who have their 
own biases when inferring the emotional meaning of 
facial movements (e.g., Galati, Miceli, & Sini, 2001; 
Galati, Scherer, & Ricci-Bitti, 1997; Valente et ah, 2018). 
In addition, people who are blind make additional, 
often unusual movements of the head and the eyes 
(Chiesa, Galati, & Schmidt, 2015) to better hear objects 
or echoes. These unusual movements might influence 
expressive facial movements. More importantly, they 
reveal whether a participant is blind or sighted, and 
this knowledge can bias human raters who are judging 
the presence or absence of facial movements in emo¬ 
tional situations. 

Helpful insights about the facial expressions of con¬ 
genitally blind individuals comes from a recent review 
(Valente et ah, 2018) that surveyed 21 studies published 
between 1932 and 2015. These studies observed how 
blind participants move their faces during instances of 
emotion and then compared those movements with 
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both the proposed expressive forms in Figure 4 and the 
facial movements of sighted people. Both spontaneous 
facial movements and posed movements were tested. 
Eight older studies (published between 1932 and 1977) 
reported that congenitally blind individuals spontane¬ 
ously expressed emotions with the proposed facial 
configurations in Figure 4, but Valente et al. (correctly) 
questioned the objectivity of these studies because the 
data were based largely on subjective impressions 
offered by researchers or their assistants. 

The 13 studies published between 1980 and 2015 
were better designed: Researchers videotaped partici¬ 
pants’ facial movements and described them using a 
formal facial coding system for adults (e.g., FACS) or a 
similar coding system for children. There are too few 
of these studies and the sample sizes are insufficient to 
conduct a formal meta-analysis, but taken together they 
suggest that, in general, congenitally blind individuals 
spontaneously moved their faces in ways similar to 
sighted individuals during instances of emotion: Both 
groups expressed instances of anger, disgust, fear, hap¬ 
piness, sadness, or surprise with the proposed expres¬ 
sive configurations (or their individual AUs) in Figure 
4, with either weak reliability or no reliability, and 
neither group produced any of the configurations with 
any specificity (e.g., Galati et al., 2001; Galati et al., 
1997; Galati, Sini, Schmidt, & Tinti, 2003). The lack of 
specificity is not surprising given that, on closer inspec¬ 
tion, several of the studies discussed in Valente et al. 
(2018) compared emotion categories that systematically 
differ in their prototypical affective properties, contrast¬ 
ing facial movements in pleasant and unpleasant cir¬ 
cumstances (e.g.. Cole et al., 1989), or observed facial 
movements only in pleasant circumstances without 
distinguishing the facial AUs for the happiness category 
from other positive emotion categories (e.g., Chiesa 
et al., 2015). As a consequence, the findings from these 
studies cannot be interpreted unambiguously as evidence 
specifically pertaining to emotional expressions, per se. 

Congenitally blind and sighted individuals were simi¬ 
lar to one another in the variety of their spontaneous 
facial movements, but they differed in their posed facial 
configurations. After listening to descriptions of situa¬ 
tions that were assumed to elicit an instance of anger, 
sadness, fear, disgust, surprise, and happiness, sighted 
participants posed their faces with the proposed expres¬ 
sive forms for the negative emotion categories in Figure 
4 at higher levels of reliability and specificity than did 
blind participants (Galati et al., 1997; Roch-Levecq, 
2006). These findings suggest that sighted individuals 
share common beliefs about emotional expressions, 
replicating other findings with posed expressions (see 
Table 3, third data row), whereas congenitally blind 
individuals may share these beliefs to a lesser degree; 


their knowledge of social rules for producing those 
configurations on command differs from those of 
sighted individuals. 

Taken together, the evidence from studies of blind 
individuals is consistent with the other scientific evidence 
reviewed so far (see Table 3). Even in the absence of 
visual experience, blind individuals, like sighted individu¬ 
als, develop the ability to spontaneously make a variety 
of facial movements to express emotion, but those move¬ 
ments do not reliably and specifically configure in the 
manner proposed by the common view of emotion 
(depicted in Eig. 4). Learning to voluntarily pose the 
proposed expressions in Eigure 4 does seem to covary 
with vision, however, further emphasizing that posed and 
spontaneous expressions should be treated as different 
phenomena. Further scientific attention is warranted to 
examine how congenitally blind individuals learn, via 
other sensory modalities, to express emotions. 

Summary of scientific evidence on the 
production of facial expressions 

The scientific findings we have reviewed thus far—deal¬ 
ing with how people actually move their faces during 
emotional events—does not strongly support the com¬ 
mon view that people reliably and specifically express 
instances of emotion categories with spontaneous facial 
configurations that resemble those proposed in Figure 
4. Adults around the world, infants and children, and 
congenitally blind individuals all show much more vari¬ 
ability than commonly hypothesized. Studies of posed 
expressions further suggest that people believe that 
particular facial movements express particular emotions 
more reliably and specifically than is warranted by the 
scientific evidence. Consequently, it is misleading to 
refer to facial movements with commonly used phrases 
such as “emotional facial expression,” “emotional 
expression” or “emotional display.” More neutral phrases 
that assume less, such as “facial configuration” or “pat¬ 
tern of facial movements” or even “facial actions,” are 
more scientifically accurate and should be used instead. 

We next turn our attention to the question of whether 
people reliably and specifically infer certain emotions 
from certain patterns of facial movements, shifting 
our focus from studies of production to studies of per¬ 
ception. It has long been assumed that emotion percep¬ 
tion provides an indirect way of testing the common 
view of expression production, because facial expres¬ 
sions, when they are assumed to be displays of emo¬ 
tional states, are thought to have coevolved with the 
ability to recognize and read them (Ekman, Friesen, & 
Ellsworth, 1972). Eor example, Shariff and Tracy 
(2011) have suggested that emotional expression and 
emotion perception likely coevolved as an integrated 
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signaling system (for additional discussion, see Jack, 
Sun, Delis, Garrod, & Schyns, 2016).^* In the next 
section, we review the scientific evidence on emotion 
perception. 

Perceiving Emotions From Facial 
Movements: A Review of the Scientific 
Evidence 

For over a century, researchers have directly examined 
whether people reliably and specifically infer emotional 
meaning in the facial configurations presented in Figure 
4. Most of these studies are interpreted as evidence for 
people’s ability to recognize or decode emotion in facial 
configurations, on the assumption that the configura¬ 
tions broadcast or signal emotional information to be 
recognized or detected. This is yet another example of 
confusing what is known and what is being tested. A 
more correct interpretation is that these studies evaluate 
whether or not people reliably and specifically infer, 
attribute, or judge emotion in those facial configura¬ 
tions. The pervasive tendency to confuse inference and 
recognition may explain why very few studies have 
actually investigated the processes by which people 
detect the onset and offset of facial movements and 
infer emotions in those movements (i.e., few studies 
consider the mechanisms by which people infer emo¬ 
tional states from detecting and perceiving facial move¬ 
ments; for discussion, see Lynn & Barrett, 2014; 
Martinez, 2017a, 2017b). In this section, we first review 
the design of typical emotion-perception experiments 
that are used to test the common view that emotions can 
be reliably and specifically “read out” from facial move¬ 
ments. We also examine the emotions people infer from 
the facial movements in dynamic, computer-generated 
faces, a class of studies that offers an interesting alterna¬ 
tive way to study emotion perception, and in virtual 
humans, which provides the opportunity for a more 
implicit approach to studying emotion perception. 

The anatomy of a typical experiment 
designed to test the common view 

For a person—a perceiver—to infer that another person 
is in an emotional state by looking at that person’s facial 
movements, the perceiver must have many competen¬ 
cies. People move their faces continuously (i.e., real 
human faces are never still), so a perceiver must notice 
or detect the relevant facial movements in question and 
discriminate them from other facial movements (that 
is, the perceiver must be able to set a perceptual bound¬ 
ary to know when the movements begin and end, and, 
for example, that a scowl is different from a sneer). To 
do this, the perceiver must be able to identify (or 


segment) the movements as an ensemble or pattern 
(i.e., bind them together and distinguish them from 
other movements that are normally inferred to be irrel¬ 
evant). And the perceiver must be able to infer similari¬ 
ties and differences between different instances of facial 
movements, as specified by the task (e.g., categorize a 
group of facial movements as instances expressing 
anger). This categorization might involve merely 
labeling the facial movements, referred to as action 
identification (describing how a face is moving, such 
as smiling) or it might involve inferring that a particular 
mental state caused the actions, referred to as mental 
inference or mentalizing (inferring why the action is 
performed, such as a state of happiness; Vallacher & 
Wegner, 1987). In principle, the categorization could 
also involve inferring a situational cause for the actions, 
but in practice, this question is rarely investigated in 
studies of emotion perception. The overwhelming 
majority of studies ask participants to make mental 
inferences, although, as we discuss later in this section, 
there appears to be important cultural variation in 
whether emotions are perceived as situated actions or 
as mental states that cause actions. 

The use of posed configurations of facial movements 
in assessments of emotion perception. In the majority 
of the experiments that study emotion perception, research¬ 
ers ask participants to infer emotion in photographs of 
posed facial configurations (such as those in Fig. 4, but 
without the FACS codes). In most studies, the configura¬ 
tions have been posed by people who were not in an emo¬ 
tional state when the photos were taken. In a growing 
number of studies, the poses are created with computer¬ 
generated humans who have no actual emotional state. 
As a consequence, it is not possible to assess the accu¬ 
racy (i.e., validity) of perceivers’ emotional inferences 
and, correspondingly, data from emotion-perception 
studies should not be interpreted as support for the valid¬ 
ity of the common view of emotional expressions (except 
insofar as these are simply stipulated to be the consen¬ 
sus). As is the case in expression-production studies, it is 
more appropriate to interpret participants’ responses in 
terms of their agreement (or consensus) with common 
beliefs (which may vary by language and culture). 

Even more serious is the fact that the proposed 
expressive facial configurations in Figure 4, which are 
routinely used as stimuli in emotion-perception studies, 
do not capture the wider range of muscle movements 
that are observed when people actually express instances 
of these emotion categories in the lab or in everyday life. 
A recent study that mined more than 7 million images 
from the Internet (Srinivasan & Martinez, 2018; for method, 
see Box 7 in the Supplemental Material) identified mul¬ 
tiple facial configurations associated with the same 
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emotion-category label and its synonyms—17 distinct 
facial configurations were associated with the word 
happiness, five with anger, four with sadness, four with 
surprised, two with fear, and one with disgust. The 
different facial configurations associated with each 
emotion word were more than mere variations on a uni¬ 
versal core expression—they were distinctive sets of facial 
movements. 

Measuring emotion perception. The typical emotion 
perception experiment takes one of several forms, sum¬ 
marized in Table 4. Choice-from-array tasks, in which par¬ 
ticipants are asked to match photos of facial configurations 
and emotion words (with or without brief stories), have 
dominated the study of emotion perception since the 
1970s. For example, a meta-analysis of emotion-perception 
studies published in 2002 summarized 87 studies, 83 (95%) 
of which exclusively used a choice-from-array response 
method (Elfenbein & Ambady, 2002). This method has 
been widely criticized for more than 2 decades, however, 
because it limits the possibility of observing evidence that 
could disconfirm the common view. Choice-from-array 
tasks strongly constrain the possible meanings that partici¬ 
pants can infer in a facial configuration, such as a photo¬ 
graph of a scowl, because they can choose only the 
options provided in the experiment (usually a small num¬ 
ber of emotion words). In fact, the preponderance of 
choice-from-array tasks in the scientific study of emotion 
perception has been identified as one important factor 
that has helped perpetuate and sustain the common view 
(Russell, 1994). Other tasks exist for assessing emotion 
perception (see Table 4), including those that use a free- 
labeling method, where participants are asked to freely 
nominate words to label photographs of posed facial 
configurations, rather than choosing a word from a small 
set of predefined options. For example, on viewing a 
scowling configuration, participants might offer responses 
like “angry,” “sad,” “confused,” “hungry,” or even “wanting 
to avoid a social interaction.” By allowing participants 
more freedom in how they infer meaning in a facial con¬ 
figuration, free labeling makes it equally possible to 
observe evidence that could either support or disconfirm 
the common view. 

Recent innovations in measuring emotion perception 
use computer-generated faces or heads rather than pho¬ 
tographs of posed human faces. One method, called 
reverse correlation, measures participants’ internal 
model of emotional expressions (i.e., their mental rep¬ 
resentations of which facial configurations are likely to 
express instances of emotion) by observing how par¬ 
ticipants label an avatar head that displays random com¬ 
binations of animated facial action units (Yu, Garrod, & 
Schyns, 2012; for a review, see Jack, Crivelli, & Wheatley, 
2018; Jack & Schyns, 2017). As each pattern appears on 
the computer screen (on a given test trial), participants 


infer its emotional meaning by choosing an emotion 
label from a set of options (a choice-from-array 
response). After thousands of trials, researchers estimate 
the statistical relationship between the dynamic patterns 
of facial movements and each emotion word (e.g., dis¬ 
gust) to reveal participants’ beliefs about which facial 
configurations are likely to express different emotion 
categories. 

A second approach using computer-generated faces 
has participants interact with more fully developed uir- 
tual humans (Rickel et al., 2002), also known as embod¬ 
ied conversational agents (Cassell et ah, 2000). 
Software-based virtual humans look like and act like 
people (for examples, see Fig. 9). They are similar to 
characters in video games in their surface appearance 
and are designed to interact face-to-face with humans 
using the same verbal and nonverbal behaviors that 
people use to interact with one another. The underlying 
technologies used to realize virtual humans vary consid¬ 
erably in approach and capability, but most virtual-human 
models can be programmed to make context-sensitive, 
dynamic facial actions that would, when used by a per¬ 
son, typically communicate emotional information to 
other people (see Box 11 in the Supplemental Material 
for discussion). The majority of the scientific studies 
with virtual humans were not designed to test whether 
human participants infer specific emotional meaning in 
a virtual human’s facial movements, but their design 
makes them useful for studying when and how facial 
movements take on meaning as emotional expressions: 
Unlike all the other ways of assessing emotion percep¬ 
tion discussed so far, which ask participants to make 
explicit inferences about the emotional cause of facial 
configurations, interactions with virtual humans offers 
the possibility of learning how a participant implicitly 
infers emotional meaning during social interactions. 

Testing the common view of emotion perception; inter¬ 
preting the scientific observations. Traditionally, in 
most experiments, if participants reliably infer the hypo¬ 
thesized emotional state from a facial configuration (e.g., 
inferring anger from a scowling configuration) at levels 
that are greater than what would be expected by chance, 
then this is taken as evidence that people “recognize that 
emotional state in its facial display.” It is more scientifi¬ 
cally correct, however, to interpret such observations as 
evidence that people infer an emotional state (i.e., they 
consistently make a reverse inference) at greater-than- 
chance levels. Only when reverse inferences are observed 
in a reliable and specific way within an experiment can 
scientists reasonably infer that participants are perceiving 
an instance of a certain emotion category in a certain 
facial configuration; technically, the inference holds only 
for emotion perception as it occurs in the particular situ¬ 
ations contained in the experiment (because situations 
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Table 4. Pros and Cons of Common Tasks for Measuring Explicit Emotion Perception 


General considerations 

Participants are typically asked to infer emotional meaning in posed, rather than spontaneous, facial configurations. Spontaneous 
or candid facial configurations typically produce much lower levels of agreement in emotion-perception studies (e.g., Kayyal & 
Rus.sell, 2013; Naab & Rmssell, 2007). 

Participants are typically asked to infer emotional meaning in static, nonmoving facial configurations (i.e., in photographs rather 
than movies). This reduces the ecological validity of the findings for how people infer emotional meaning in faces in the 
real world. In the real world, people have to infer when a set of movements begin and end; this is called discrimination or 
detection. Moreover, there is information in the dynamics of facial movements (Jack & Schyns, 2017; Krumhuber, Kappas, & 
Manstead, 2013), but dynamic facial movements, particularly when they are spontaneous, do not always produce higher levels 
of agreement in emotion-perception studies. Dynamic movements add realism and intensity and improve levels of agreement 
primarily when movements are degraded or are artificial. 

Participants are typically asked to infer emotional meaning in exaggerated facial configurations, which are said to have greater 
“source clarity” (Ekman, Friesen, & Ellsworth, 1972). They reduce the ecological validity of the findings for how people infer 
emotional meaning in faces in the real world. The facial configurations used in most experiments (see Fig. 4) are caricatures— 
they are exaggerated to maximally distinguish one from the another. Caricatures are easier to label (categorize) than are typical 
stimuli, particularly when the categories in question are highly interrelated (Goldstone, Steyvers, & Rogosky, 2003). 

Participants are typically asked to infer emotional meaning in highly selected facial configurations. In early studies, a smaller set 
of exaggerated facial configurations were culled from much larger sets of posed faces (involving several thousand faces; for a 
discussion, see Gendron & Barrett, 2017; Russell, 1994). 

Only a single task used in most experiments (i.e., participants are asked to infer emotion in facial configurations via one 

method of responding). Ideally, multiple tasks should be used with the same population of participants to determine whether 
convergent results are obtained. This approach is rarely taken, but for an example, see Crivelli, Jarillo, Russell, & Fernandez- 
Dols, 2016; Gendron, Roberson, van der Vyver, & Barrett, 2014b; Gendron, Hoemann, et al., 2018). 

Test-retest reliability is rarely evaluated but is critical. A number of contextual factors are known to influence judgments, 
including a perceiver’s internal state. Test-retest assessments are rarely done for practical reasons. 

Most experiments ask participants to infer emotion in a disembodied face, alone, without context. This reduces the ecological 
validity of the findings for how people infer emotional meaning in faces in the real world. In addition, a growing number 
of experiments now show that context is an important, and sometimes dominant, source of information when people infer 
emotional meaning in a facial configuration. (See Box 3 in the Supplemental Material.) For example, situational information 
tends to dominate perception of emotion in faces in common, everyday events (Carrera-Levillain & Fernandez-Dols, 1994), 
even when situations are more ambiguous than the exaggerated facial configurations being judged (Carroll & Russell, 1996, 
Study 3). 

Most studies do not report evidence about the specificity of emotion perceptions or the frequency with which people infer the 
nonintended emotional meaning from a facial configuration. 

Until recently, the large majority of experiments included only one pleasant emotion category (happiness) among several 
unpleasant emotion categories (anger, fear, sadness, etc.). This may be one reason that agreement rates are so high for 
smiles. In the past few years, experiments have included a larger variety of pleasant emotion categories (pride, awe, gratitude, 
etc.), but there continues to be debate over whether these emotion categories are expressed with consistent, specific facial 
configurations. 

Choice-From-Array Task: matching photos of facial configurations and emotion words (with or without brief stories). 
Response options are limited to those provided in the task. 

Participants are (a) shown a facial configuration and asked to infer its emotional meaning by choosing an emotion word from a 
small set of words or (b) presented with an emotion word that labels an emotion category (e.g., sadness) or a brief story about 
a typical instance of an emotion category (e.g., “the boy’s much loved dog just died and he is sad”) along with two or three 
photographs of faces (typically posed into one of the configurations presented in Fig. 4) and then asked to choose the facial 
configuration that they judge best matches the emotional episode described in the word or vignette. Typically, each emotion 
category is represented by a single scenario. 

Words influence how the brain processes visual inputs from faces (e.g., Doyle & Lindquist, 2018; Gendron, Lindquist, Barsalou, 

& Barrett, 2012). Stories can prime action perceptions, as well. More generally, choice-from-array tasks have been shown to 
encourage biased perceptual responding using a signal detection analysis (e.g., DeCarlo, 2012). Choice-from-array tasks are still 
commonly used, however, because they are easy and efficient and straightforward for participants to understand. Choice-from- 
array responses are easy for scientists to score. Most studies using continuous judgments (rather than forced choice) find that 
participants do not infer emotional meaning in facial configurations in a yes/no or on/off sort of way (Russell, 1994). 

The fact that participants are exposed to the same facial configurations and emotion words over and over allows them to learn 
the intended pairings even if they do not know them to begin with (N. L. Nelson & Russell, 2016). 

An emotion word does not necessarily have a unique correspondence to a single emotion category for all people in a given 
culture (i.e., they may differ in emotional granularity; L. F. Barrett, 2004, 2017b; Lindquist & Barrett, 2008) or people from 
different cultures. Concerns about individual word meaning is why choice-from-array using stories might be preferable to those 
using single words. 
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Table 4. (Continued) 

A small range of answers is predetermined by the experimenter, making it easier for participants to provide the answers scientists 
expect. For example, by constraining which words participants were allowed to choose from, frowns were consensually 
labeled as fear, wide-eyed, gasping faces were labeled as surprise (Russell, 1993). Scowling faces are more likely to be 
perceived as fearful when paired with the description of danger (Carroll & Russell, 1996, Study 1) and appear determined or 
puzzled depending on the story they are presented with (Carroll & Russell, 1996, Study 2). 

Participants are asked to make yes/no decisions when assigning a facial configuration to an emotion category. Multiple emotion 
words may apply to a single configuration (i.e., people might infer more than one emotional meaning in a face), but the 
option to infer multiple emotional meanings is rarely given to participants. Continuous judgments using, for example, a Likert- 
type scale ranging from 1 to 7 would solve both of these problems and also allow analysis of the similarity among facial 
configurations (which evidence shows is important; e.g., Jack, Sun, Delis, Garrod, & Schyns, 2016; Kayyal & Russell, 2013). 
Similarity allows scientists to discover the emotional meanings that people implicitly assign to a facial configuration, rather than 
having people explicitly state them (see further discussion of similarity below). 

A participant might decide that no emotion word provided applies to a facial configuration, but the option to respond this way 
is rarely given to participants (they are usually forced to choose an emotion word; for discussion, see Frank & Stennett, 2001). 
See Cordaro et al. (2017) for an example of this design feature. 

If a participant hears a story and is asked to choose between two faces (e.g., a scowl and smile), he or she can give the expected 
answer (e.g., scowl) simply by figuring out that “smile” is NOT correct. For example, after hearing a story about anger, a 
participant is shown a scowl and a smile and can choose the scowl merely by realizing the smile is not correct (on the basis 
of valence). This is similar to getting an answer right on a multiple-choice test by eliminating all the alternatives—you do 
not actually know the right answer, but you figured it out because of the structure of the task. A similar point can be made 
about showing a single face and asking participants to label it with a word by selecting from among a small set of options. 
Participants use a process of elimination strategy: Words that are not chosen on prior trials are selected more frequently, 
inflating agreement levels (DiGirolamo & Russell, 2017). 

If participants hear a story about anger and must choose between a scowl and a smile, they can figure out that the scowl is 
correct merely because they are distinguishing between negative (scowl) and positive (smile). If participants hear a story about 
anger and must choose between a scowl and a frown, they can figure out that the scowl is correct merely because they is 
distinguishing between high arousal (scowl) and low arousal (frown). 

In tasks that involve brief stories or vignettes about emotion, only one typical story is offered for each emotion category, making 
it more difficult to observe any variation within a category. 

Free-Sorting Task: photos of facial configurations are sorted into groupings, such that each grouping represents 
a perceived category. Cue-to-Cue Matching: matching photos of facial configurations to a recording of posed 
vocalization. 

Most participants still spontaneously use words to guide their sorting and organize their groupings. Free sorting and cue matching 
are ideal for preverbal participants or those with semantic deficits (e.g., Lindquist, Gendron, Barrett, & Dickerson, 2014). 

Similarity Task: Judgments between pairs of facial configurations. Perceptual Matching: Indicating whether two 
photos of facial configurations belong to the same emotion category. 

It is inefficient and time-consuming to judge the similarity of all pairs of facial configurations. For a set of 100 faces, this requires 
(100 X 100)/2 = 5,000 different similarity judgments. Participants can arrange face stimuli on a computer screen, and all 
pairwise similarity judgments can be computed (the SPAM method proposed by Goldstone, 1994; e.g., see Rout, Goldinger, 

& Ferguson, 2013). This procedure also solves the problem that the same pair of stimuli will have a different judged similarity 
depending on which item is presented first if face pairs are presented sequentially (the judged similarity of two objects, A and 
B, can depend on the order in which they are presented; the similarity between A and B is not always judged to be the same 
as that between B and A; Tversky, 1977). Other advantages are that categories can be discovered, rather than prescribed, and 
verbal associations are minimized. Analyses of similarity judgments typically yield more continuous similarity relations between 
emotion categories along affective dimensions (see Russell & Barrett, 1999). 

Free-Labeling Task: photos of facial configurations are labeled with words offered by participants (unconstrained by 
experimenter). 

Forcing people to translate faces into words is not a good idea, because much of the information from faces cannot be easily 
captured in words (Ekman, 1994). In addition, facial expressions did not evolve to represent specific verbal labels (Ekman, 

1994 , p. 270). “Regardless of the language, of whether the culture is Western or Eastern, industrialized or preliterate, these 
facial expressions are labeled with the same emotion terms: happiness, sadness, anger, fear, di.sgust, and surprise” (Ekman, 
1972 , p. 278). These are not special criticisms of free-labeling studies—it applies to all studies that ask people to label a face 
with words, including the choice-from-array tasks. 
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Table 4. (Continued) 

There is no widely accepted method for scoring (i.e,, categorizing) freely provided responses. (Ekman, 1994, p. 274). Most scientists 
group together similar words (synonym.s), so that a variety of words can be used to show evidence of a correct response (e.g., a 
frowning face, which is the proposed expression for sadness, could be labeled as sad, grieving, disappointed, blue, despairing, 
and so on. Scientists routinely use databases that indicate synonyms (e.g., WORDNET; used in Srinivasan & Martinez, 2018). In 
addition, it is possible to do data-driven groupings of emotion words into semantic categories (e.g.. Jack et al., 2016; Shaver, 
Schwartz, Kirson, & O’Connor, 1987). The more serious problem is that early studies using free labeling (e.g., Boucher & Carlson, 
1980; Izard, 1971) did not provide enough information in the method sections about how freely provided labels were grouped. 

Using freely chosen labels in a study of different cultures is difficult because it may be hard to find adequate translations (Ekman, 
1994, p. 274). A given emotion word, such as sadness, can correspond to different emotion concepts (with different features) in 
different languages (e.g., Wierzbicka, 1986, 2014). A single emotion word in one language can refer to more than one concept 
in another language (e.g., Pavlenko, 2014). Some languages have no one-to-one translation for English emotion words, and 
some emotion concepts in other languages are not directly translatable into English emotion words (see L. F. Barrett, 2017b; 
Russell, 1991; Jack et al., 2016). This is not a special criticism of free-labeling studies, however; it holds for any experiment 
that uses emotion words requiring translation, including choice-from-array tasks. A standard solution to this problem is to use 
both forward and backward translation (e.g., a word spoken in Hadzane is translated into English and then back translated 
into Hadzane; this process estimates whether or not the translation has fidelity). An even better method is to elicit features for 
the emotion words in question, including typicality of those features, to determine the fidelity of translation (e.g., de Mendoza, 
Fernandez-Dols, Parrott, & Carrera, 2010). 

Scientifically, issues with translation are manageable if scientists allow phrases to stand in for specific words. 

Using only single words will always fail to capture much of the rich information in faces. Participants often provide multiple 
words or even longer descriptions of situations, behaviors, or behaviors in situations (e.g., see Gendron et al., 2014b; Russell, 
1994). Such data are time consuming to code and analyze. 

Even when participants are told that photographs are of people trying to express an emotion, they often offer nonemotion labels. 
For example, Izard (1971) found that people offered labels such as deliberating, clowning, skepticism, pain, and so on (as 
reported in Russell, 1994).This is not necessarily evidence that participants did not understand the task asked of them. It might 
be evidence that these facial configurations are not specific for expressing emotions. 

Note: Re.spon.se tasks are anayed in order from those that most constrain participants' responses (making it difficult to observe evidence that 

can disconfirm common beliefs about emotion) to those that least constrain participants’ responses (making it easier to observe variation and 

disconfirm common beliefs). For detailed design concerns about choice-from-array tasks, see Russell (1994, 1995). 


are never randomly sampled). If the emotion-perception 
evidence is replicated across experiments that sample 
people from the same culture, then the interpretation can 
be generalized to emotion perceptions in that culture. Only 
when the findings generalize across cultures—that is, are 
replicated across experiments that sample people from dif¬ 
ferent cultures—is it reasonable to conclude that people 


universally infer a specific emotional state when perceiv¬ 
ing a specific facial configuration. These findings can be 
interpreted as evidence about the reliability and specific¬ 
ity of producing emotional expressions if the coevolution 
assumption is valid (i.e., that emotional expressions and 
their perception coevolved as an integrated signaling sys¬ 
tem; Ekman et al., 1972; Jack et al., 2016; Shariff & Tracy, 



Fig. 9. Examples of virtual humans. Virtual humans are software-based artifacts that look like and act like people, (a) The system that 
used this virtual human is described in Feng, Jeong, Kramer, Miller, and Marsella (2017). (b) This virtual human is reproduced from Zoll, 
Enz, Aylett, and Paiva (2006). (c) This virtual human is reproduced from Hoyt, C., Blascovich, J., and Swinth, K. (2003). Social inhibition 
in immersive virtual environments. Presence, 12(X), 183-195, courtesy of The MIT Press, (d) The system that was used to create this virtual 
human is described in Marsella, Johnson, and LaBore (2000). 
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2011). The findings can be interpreted as evidence about 
emotion recognition only if the reverse inference has 
been verified as valid (i.e., if it can be verified that the 
person in the photograph is, indeed, in the expected 
emotional state). 

Studies of healthy adults from the United 
States and other developed nations 

Studies that measure emotion perception with choice- 
from-array tasks. The most recent meta-analysis of 
emotion-perception studies was publi.shed by Elfenbein 
and Ambady (2002). It statistically summarized 87 experi¬ 
ments in which more than 22,000 participants from more 
than 20 cultures around the world inferred emotional mean¬ 
ing in facial configurations and other stimuli (e.g., posed 
vocalizations). The majority of participants were sampled 
from larger-scale or developed countries, including Argentina, 
Brazil, Canada, Chile, China, England, Estonia, Ethiopia, 
France, Germany, Greece, Indonesia, Ireland, Israel, 
Italy, Japan, Malaysia, Mexico, the Netherlands, Scotland, 
Singapore, Sweden, Switzerland, Turkey, the United States, 
Zambia, and various Caribbean countries. The majority of 
studies (95%) used posed facial configurations; only four 
studies had participants label spontaneous facial move¬ 
ments, a dramatic example of the challenges facing valid¬ 
ity that we discussed earlier. All but four studies used a 
choice-from-array response method to measure emotion 
inferences, a good example of the challenges facing 
hypothesis disconfirmation that we discussed earlier. 

The results of the meta-analysis, presented in Figure 
10, reveal that perceivers inferred emotions in the facial 
configurations of Figure 4 in line with the common 
view, well above chance levels (using the criteria set 
out by Haidt and Keltner, 1999, presented in Table 2 of 
the current article). Results provided strong evidence 
that, when participants viewed posed facial configura¬ 
tions made by people from their own culture, they 
reliably perceived the expected emotion in those 
configurations: Scowling facial configurations were per¬ 
ceived as anger expressions, wide-eyed facial configu¬ 
rations were perceived as fear expressions, and so on, 
for all six emotion categories. Moderate levels of reli¬ 
ability were observed when perceivers were labeling 
facial configurations posed by people from other cul¬ 
tures; this difference in reliability between same- and 
cross-culture differences is referred to as an in-group 
advantage (see Box 12, in the Supplemental Material). 
The majority of emotion-perception studies did not 
report whether the hypothesized facial configurations 
were perceived with any specificity (e.g., how likely 
was a scowl to be perceived as expressing an instance 
of emotion categories other than anger, or as an instance 
of a mental category that is not considered emotional). 


Without information about specificity, no firm con¬ 
clusions can be drawn about the emotional meaning 
of the facial configurations in Figure 4, especially 
for the translational purpose of inferring someone’s 
emotional state from their facial comportment in real 
life. 

Nonetheless, most of the studies cited in the Elfenbein 
and Ambady (2002) meta-analysis interpret their reli¬ 
ability findings alone (i.e., inferring anger from a scowl¬ 
ing face, disgust from a nose-wrinkled face, fear from 
a wide-eyed, gasping face, etc.) as evidence of accurate 
reverse inferences. Such interpretations may explain 
why many scientists who study emotion, when sur¬ 
veyed, indicated that they believe compelling evidence 
exists for the hypothesis that certain emotion categories 
are each expressed with a unique, universal facial con¬ 
figuration (see Ekman, 2016) and interpret variation in 
emotional expressions to be caused by cultural learning 
that modifies what are presumed to be inborn universal 
expressive patterns (e.g., Cordaro et al., 2018; Ekman, 
1972; Elfenbein, 2013). Cultural learning has also been 
hypothesized to modify how people “decode” facial 
configurations during emotion perception (Buck, 1984). 

Studies that measure emotion perception with free- 
laheling tasks. Experimental methods that place fewer 
constraints on participants’ inferences (Table 4) provide 
considerably less support for the common view of emo¬ 
tional expressions. In the least constrained experimental 
task, called free labeling, perceivers freely volunteer a 
word (emotion or otherwise) that they believe best cap¬ 
tures the meaning in a facial configuration, rather than 
choosing from a small set of experimenter-provided 
options. In urban samples, participants who freely label 
facial configurations produce the expected emotion 
labels with weak reliability (when labeling spontaneously 
produced facial configurations) to moderate reliability 
(when labeling posed facial configurations). Participants’ 
responses usually reveal weak specificity when specific¬ 
ity is assessed at all (for examples and discussion, see 
Russell, 1994; also see Naab & Russell, 2007). 

For example, participants in a study by Srinivasan 
and Martinez (2018) were sampled from multiple coun¬ 
tries. They were asked to freely provide emotion words 
in their native languages (English, Spanish, Mandarin 
Chinese, Farsi, Arabic, and Russian) to label each of 35 
facial configurations that had been cross-culturally 
identified. Their labels provided evidence of a moder¬ 
ately reliable correspondence between facial configura¬ 
tions and emotion categories, but there was no evidence 
of specificity (see Fig. 11).^® Multiple facial configura¬ 
tions were associated with the same emotion category 
label (e.g., 17 different facial configurations were asso¬ 
ciated with the expression of happiness, five with 
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Fig. 10. Emotion-perception findings. Average effect sizes for perceptions of facial configurations in which 95% of 
the articles summarized used choice-from-array to measure participants’ emotion inferences. Data are from Elfenbein 
and Ambady (2002). The images presented on the A:-axis are for illustrative purposes only and were not necessarily 
used in the articles summarized in this meta-analysis. 


anger, four with sadness, four with surprise, two with 
fear, and one with disgust). This many-to-many map¬ 
ping is inconsistent with the common view that the 
facial configurations in Figure 4 are universally recog¬ 
nized as expressing the hypothesized emotion category, 
and they give evidence of variation that is far beyond 
what is proposed by the basic-emotion view. Some of 
this variability may come from different cultures and 
languages, but there is variability even within a single 
culture and language. Evidence of this many-to-many 
mapping is also apparent in free-labeling tasks in small- 
scale, remote samples as well (Gendron, Crivelli, & 
Barrett, 2018), which we discuss in the next section. 

Studies that measure emotion perception with the 
reverse-correlation method. Using a choice-from-array 
response method with the reverse-correlation method is 
an inductive way to learn people’s beliefs about which 
facial configurations express the instances of an emotion 
category (for reviews, see Jack et al., 2018; Jack & Schyns, 
2017). In such studies, participants view thousands of 


random combinations of AUs that are computer gener¬ 
ated on an avatar head and label each one by choosing 
an emotion word from a set of predefined options. All of 
the facial configurations labeled with the same emotion 
word (e.g., anger) are then statistically combined for 
each participant to estimate that person’s belief about 
which facial movements express instances of the corre¬ 
sponding emotion category. One recent study using the 
reverse correlation method with participants from the 
United Kingdom and China found evidence of variation 
in the facial movements that were judged to express a 
single emotion category as well as similarity in the facial 
movements that were judged to express different catego¬ 
ries (Jack et al., 2016). The study first identified group¬ 
ings of emotion words that are widely discussed in the 
scientific literature (which, we should note, is dominated 
by English): 30 English words grouped into eight emo¬ 
tion categories for the sample from the United Kingdom 
(happy/excited/love, pride, surprise, fear, contempt/dis¬ 
gust, anger, sad, and shame/embarrassed) and 52 Chi¬ 
nese words grouped into 12 categories in the Chinese 
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Emotion Label Offered (% of Responses) 


Anger 

Disgust 

Fear 

Happiness 

Sadness 

Surprise 

Nonaffective 

Action 

^ 39.92 
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7.93 


12.92 

3.96 

11.38 
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Fig. 11. Free labeling of facial configurations across five language groups. Data are from 
Srinivasan and Martinez (2018). The proportion of times participants offered emotion-category labels 
(or their synonyms) are reported. The facial configurations presented were chosen by researchers 
to represent the best match to the hypothetical facial configurations in Figure 4 on the basis of the 
action units (AUs) present. No configuration discovered in this study exactly matches the AU con¬ 
figurations proposed by Darwin or documented in prior research. According to standard scientific 
criteria, universal expressions of emotion should elicit agreement rates that are considerably higher 
than those reported here, generally in the 70% to 90% range, even when methodological constraints 
are relaxed (Haidt & Keltner, 1999). Specificity data are not available for the Elfenbein and Ambady 
(2002) meta-analysis. 


sample (joyful/excitement, pleasant surprise, great sur¬ 
prise/amazement, shock/alarm, fear, disgust, anger, sad, 
embarrassment, shame, pride, and despise). The reverse- 
correlation method revealed 62 separate facial configura¬ 
tions: The same emotion category in a given culture was 
associated with multiple models of facial movements 
because synonyms of the same emotion category were 
associated with distinctive models of facial movements. 

Amidst this variability. Jack and colleagues also found 
that these 62 separate facial configurations could be 
summarized as four prototypes, which are presented in 
Figure 8 along with the corresponding emotion words 
with which they were frequently associated. Each pro¬ 
totype was described with a unique set of affective 
features (combinations of valence, arousal and domi¬ 
nance). A comparison of the four estimated configura¬ 
tions with the common view presented in Figure 4 and 
with the basic-emotion hypotheses listed in Table 1 
reveals some striking similarities: Configuration 1 in 
Figure 8 most closely resembles the proposed expression 


for happiness. Configuration 2 is similar to a combination 
of the proposed expressions for fear and anger, Configu¬ 
ration 3 most closely resembles the proposed expression 
for surprise, and Configuration 4 is similar to a combina¬ 
tion of the proposed expressions for disgust and anger. 
Taken together, these findings suggest that, at the most 
general level of description, participants’ beliefs about 
emotional expressions (i.e., their internal models of 
which facial movements expressed which emotions) 
were consistent with the common view (indeed, they 
could be taken to constitute part of the common view); 
when examined in finer detail with more granularity, 
however, the findings also give evidence of substantial 
within-category variation in beliefs about the facial 
movements that express instances of the same emotion 
category. This observation suggests that the way the 
common view is often described in scientific reviews, 
depicted in the media, and used in many applications 
does not, in fact, do justice to people’s more varied 
beliefs about facial expressions of emotion. 
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Studies that implicitly assess emotion perception 
during interactions with virtual humans. Design¬ 
ers typically study how a virtual human’s expressive 
movements influence an interaction with a human par¬ 
ticipant. Much of the early research modeling expressive 
movements in virtual humans focused on endowing 
them with the facial expressions proposed in Figure 4. A 
number of studies have endowed virtual humans with 
blends of these configurations (Arya, DiPaola, & Parush, 
2009; Bui, Heylen, Poel, & Nijholt, 2004). Designers are 
also inspired by people’s beliefs about how emotions are 
expressed. Actors, for example, have been asked to pose 
facial configurations that they believe express emotions, 
which are then processed by graphical and machine¬ 
learning algorithms to craft the relation between emo¬ 
tional states and expressive movements (Alexander, 
Rogers, Lambeth, Chiang, & Debevec, 2009). In another 
study, human subjects used a specially designed software 
tool to craft animations of facial movements that they 
believed express certain mental categories, including 
emotion categories. Then, other human subjects judged 
the crafted facial configurations (Ochs, Niewiadomski, & 
Pelachaud, 2010). Increasingly, data-driven methods are 
used that place people in emotion-eliciting conditions, 
capture the facial and body motion, and then synthesize 
animations from those captured motions (Ding, Prepin, 
Huang, Pelachaud, & Artie'res, 2014; Niewiadomski 
et al., 2015; N. Wang, Marsella, & Hawkins, 2008). 

In general, studies with virtual humans show nicely 
how the situational context influences people’s infer¬ 
ences about the meaning of facial movements (de Melo, 
Carnevale, Read, & Gratch, 2014). For example, in a 
game that allowed competition and cooperation 
(Prisoner’s Dilemma, Pruitt & Kimmel, 1977), a virtual 
human who smiled after making a competitive move 
evoked more competitive and less cooperative 
responses from human participants compared with a 
virtual human using an identical strategy in the game 
(tit-for-tat) but who smiled after cooperating. Virtual 
humans who make a verbal comment about a film that 
is inconsistent with their facial movements, such as 
saying they enjoyed the film but grimacing, quickly 
followed by a smile, were perceived as less reliable, 
trustworthy, and credible (Rehm & Andre, 2005). 

The dynamics of the facial actions, including the 
relative timing, speed, and duration of the individual 
facial actions, as well as the sequence of facial muscle 
movements over time, offer information over and above 
the mere presence or absence of the movements them¬ 
selves and have an important influence on how human 
perceivers interpret facial movements (e.g., Ambadar, 
Cohn, & Reed, 2009; Jack & Schyns, 2017; Keltner, 1995; 
Krumhuber, Kappas, & Manstead, 2013) and how much 
they trust a virtual human during a social interaction 


(Krumhuber, Manstead, Cosker, Marshall, & Rosin, 
2009). Research with virtual humans has shown that 
the dynamics of facial muscle movements are critical 
for them to be perceived as emotional expressions 
(Niewiadomski et al., 2015; Ochs et ah, 2010). These 
findings are consistent with research showing that the 
temporal dynamics carry information about the emo¬ 
tional meaning of facial movements that are made by 
real humans (e.g., Kamachi et al., 2001; Krumhuber & 
Kappas, 2005; Sato & Yoshikawa, 2004; for a review, 
see Krumhuber et al., 2013).^^ 

Summary. Whether people can reliably perceive emo¬ 
tions in the expressive configurations of Figure 4, as pre¬ 
dicted by the common view, depends on how participants 
are asked to report or register their inferences (see Table 
3). Hundreds of experiments have asked participants to 
infer the emotional meaning of posed, exaggerated facial 
configurations (such as those presented in Figure 4) by 
choosing a single emotion word from a small number of 
options offered by scientists, called choice-from-array- 
tasks. This experimental approach tends to generate 
moderate to strong evidence that people reliably label 
scowling facial configurations as angry, frowning facial 
configurations as sad, and so on for all six emotion cat¬ 
egories that anchor the common view. Choice-from-array 
tasks severely limit the possibility of observing evidence 
that can disconfirm the common view of emotional 
expressions, however, because they restrict participants’ 
options for inferring the psychological meaning of facial 
configurations by offering them a limited set of emotion 
labels. (As we discuss below, when people are provided 
with labels other than angry, sad, afraid, and so on, 
they routinely choose them; also see Carroll & Russell, 
1996; Crivelli et al., 2017). In addition, the specificity of 
emotion-perception judgments is largely unreported. 

Scientists often go further and interpret the better- 
than-chance reliability findings from these studies as 
evidence that scowls are expressions of anger, frowns 
are expressions of sadness, and so on. Such inferences 
are not sound, however, because most of these studies 
ask participants to infer emotion from posed, static 
faces that are likely limited in their validity (i.e., people 
posing facial configurations such as those depicted in 
Figure 4 are unlikely to be experiencing the hypo 
thesized emotional state). Furthermore, other ways of 
assessing emotion perception, such as the reverse- 
correlation method and free-labeling tasks, find 
much weaker evidence for reliability of emotion 
inferences. Instead, they suggest that what people actu¬ 
ally infer and believe about facial movements incorpo¬ 
rates considerable variability: In short, the common 
view depicted in many reviews, summaries, the media, 
and used in numerous applications is not an accurate 



38 


Barrett et al. 


reflection of what people believe about facial expres¬ 
sions of emotion, when these beliefs are probed in 
more detail (in a way that makes it possible to 
observe evidence that could disconfirm the common 
view). In the next section, we discuss scientific evi¬ 
dence from studies of emotion perception in small- 
scale remote cultures, which further undermines the 
common view. 

Studies of healthy adults living in 
small-scale, remote cultures 

A growing number of studies examine emotion percep¬ 
tion in people from remote, nonindustrialized cultural 
groupings. A more in-depth review of these studies can 
be found in Gendron, Crivelli, and Barrett (2018). Our 
goal here is to summarize the trends found in this line 
of research (see Table 5). 

Studies that measure emotion perception with choice- 
from-array tasks. During the period from 1969 to 1975, 
between five and eight small-scale samples from remote 
cultures in the South Pacific were studied with choice- 
from-array tasks; the goal was to investigate whether these 
participants perceived emotional expressions in facial 
movements in a manner similar to that of people from the 
United States and other industrialized countries of the 
Western world (see Fig. 12a). Our uncertainty in the num¬ 
ber of samples stems from reporting inconsistencies in the 
published record (see note to Table 5). We present the 
findings here according to how the original authors 
reported their findings, despite the inconsistencies. Five 
samples performed choice-from-array tasks, three in which 
participants chose a photographed facial configuration to 
match one brief vignette that described each emotion cat¬ 
egory (Ekman, 1972; Ekman & Eriesen, 1971; Sorenson, 
1975) and two in which they chose a photograph to match 
an emotion word (Ekman, Sorenson, & Friesen, 1969). All 
five samples performed some version of a choice-from- 
array task that provided strong evidence in support of 
cross-cultural reliability of emotion perception in small- 
scale societies. Evidence for specificity was not reported. 
Until 2008, all claims that anger, sadness, fear, disgust, 
happiness, and surprise are universally recognized (and 
therefore are universally expressed) were based largely 
on three articles (two of them peer reviewed) reporting 
on four samples (Ekman, 1972; Ekman & Eriesen, 1971; 
Ekman et al., 1969).^^ 

Since 2008, 10 verifiably separate experiments observ¬ 
ing emotional inferences in small-scale societies have 
been published or submitted for publication. These 
studies involve a greater diversity of social and ecologi¬ 
cal contexts, including sampling five small-scale societ¬ 
ies across Africa and the South Pacific (see Eig. 12b) that 


were tested with a greater diversity of research methods 
listed in Table 4, including tasks that allow for the pos¬ 
sibility of observing cross-cultural variation in emotion 
perception and therefore the possibility of disconfirming 
the common view. Six samples registered their emotion 
inferences using a choice-from-array task, in which par¬ 
ticipants were given an emotion word and asked to 
choose the posed facial configuration that best matched 
it or vice versa (Crivelli, Jarillo, Russell, & Eernandez- 
Dols, 2016; Crivelli, Russell, Jarillo, & Fernandez-Dols, 
2016; Crivelli et al., 2017, Study 2; Gendron, Hoemann, 
et al., 2018, Study 2; Tracy & Robins, 2008). 

Only one study (Tracy & Robins, 2008) reported that 
participants selected an emotion word to match the 
facial configurations similar to those in Figure 4 more 
reliably than what would be expected by chance, and 
effects ranged from weak (anger and fear) to strong 
(happiness) with surprise and disgust falling in the 
moderate range.Information about the specificity of 
emotion inferences was not reported. A close examina¬ 
tion of the evidence from four studies by Crivelli and 
colleagues suggest weak to moderate levels of reliabil¬ 
ity for inferring happiness in smiling facial configura¬ 
tions (all four studies), sadness in frowning facial 
configurations (all four studies), fear in gasping, wide- 
eyed facial configurations (three studies), anger in scowl¬ 
ing facial configurations (two studies), and disgust in 
nose-wrinkled facial configurations (three studies). A 
detailed breakdown of findings can be found in Box 13 
in the Supplemental Material. None of the studies found 
specificity for any facial configuration, however, except 
that smiling was reported as unique to happiness, but 
that finding was not replicated across samples. 

The final study using a choice-from-array task with 
people from a small-scale, remote culture is important 
because it involves the Fladza hunter-gatherers of Tan¬ 
zania (Gendron, Floemann, et al., 2018, Study 2).^® The 
Fladza are a high-value sample for two reasons. First, 
universal and innate emotional expressions are hypoth¬ 
esized to have evolved to solve the recurring fitness 
challenges of hunting and gathering in small groups on 
the African savanna (Pinker, 1997; Shariff & Tracy, 2011; 
Tooby & Cosmides, 2008); the Fladza offer a rare oppor¬ 
tunity to study hunters and foragers who are currently 
living in an ecosystem that is thought to be similar to 
that of our Paleolithic ancestors.Second, the popula¬ 
tion is rapidly disappearing (Gibbons, 2018). Before 
this study, the Hadza had not participated in any studies 
of emotion perception, although they have been the 
subject of social cognition research more broadly (H. 
C. Barrett et al., 2016; Bryant et al., 2016). 

After listening to a brief story about a typical instance 
of anger, disgust, fear, happiness, sadness, and surprise, 
Fladza participants chose the expected facial 



Facial Expressions of Emotion 


39 


Table 5. Summary of Cross-Cultural Emotion Perceptior 

L in Small-Scale Societies 


Task 

Culture 

N 

Citation 

Level of 
support 

Free labeling 

Fore, PNG'* 

100 

Sorenson (1975), Sample 2** 

None 


Bahinemo, PNG 

71 

Sorenson (1975), Sample 3 

None 


Hadza, Tanzania 

43 

Gendron, Hoemann, et al. (2018), Study 1 

None 


Trobrianders, PNG 

32** 

Crivelli et al. (2017), Study 1 

None 


Sadong, Borneo** 

15 

Soren.son (1975), Sample 4** 

Strong 

Cue-to-cue matching 

Shuar, Ecuador 

23 

Bryant & Barrett (2008), Study 2 

Weak 


Flimba, Namibia* 

65 

Gendron et al. (2014) 

None 

Choice-from array: matching 

Fore, PNG'* 

32 

Ekman, Sorenson, & Friesen (1969)** 

None 

face and words 

Mwani, Mozambique 

36*:.g 

Crivelli, Jarillo, Russell, & Fernandez-Dols 
(2016), Study 2 

None 


Trobrianders, PNG 

24** 

Crivelli et al. (2017), Study 2 

None 


Trobrianders, PNG 

68 **'* 

Crivelli, Jarillo, Russell, & Fernandez-Dols 
(2016), Study 1 

None 


Trobrianders, PNG 

36** 

Crivelli, Russell, et al. (2016), Study la 

None 


Dioula, Burkina Faso'* 

39 

Tracy & Robins (2008), Study 2 

Weak 


Sadong, Borneo** 

15 

Ekman et al. (1969T 

Strong 

Choice-from array: matching 

Hadza, Tanzania 

54 

Gendron, Hoemann, et al. (2018), Study 2 

None 

face and scenario 

Dani, New Guinea** 

34 

Described in Ekman (1972)** 

Moderate 


Fore, PNG** 

189, 130* 

Soren.son (1975), Sample 1* 

Moderate 


Fore, PNG** 

189, 130* 

Ekman and Frie.sen (1971)* 

Strong 


Note. Findings for anger, disgust, fear, sadness, and surprise are summarized; happiness is the only pleasant category tested in all studies except 
for Tracy and Robins (2008). Therefore, in most studies, inferences of happiness in smiling faces do not uniquely reflect emotion perception 
and may be driven by valence perception (distinguishing pleasant from unpleasant). All studies used photographs of posed facial configurations 
that are similar to those in Figure 4, except Crivelli, Jarillo, Russell, and Fernandez-Dols, 2016, Study 2, which used dynamic as well as static 
posed configurations, and Crivelli et al., 2017, Study 1, which used static spontaneous configurations. The Bryant and Barrett (2008) study was 
designed to examine emotion perception from vocalizations but is included because perceivers matched them to faces; in addition, participants 
were tested in a second language (Spanish) in which they received training. Choice-from-array studies (except Gendron et al. 2014a, Study 2, 
and Gendron, Hoemann, et al., 2018, Study 2) did not carefully control whether target facial configurations and foils could be distinguished by 
valence and/or arousal. All participants were adults unless otherwise specified. Levels of support: “None” indicates that reliability and specificity 
were at chance levels or that any level of reliability above chance was combined with evidence of no specificity. “Weak” indicates that reliability 
was between 20% and 40% (weak) for at least a single emotion category other than happiness combined with above chance specificity for that 
category or reliability between 41% and 70% (moderate reliability) for at least a single category other than happiness with unknown specificity. 
“Moderate” indicates that reliability was between 41% and 70% combined with any evidence of above-chance specificity for a category other 
than happiness or reliability above 70% (strong reliability) for at least a single category other than happiness with unknown specificity. “Strong” 
indicates strong evidence of reliability (above 70%) and strong evidence of specificity for at least a single emotion category other than happiness. 
It is questionable whether the Sadong and the Fore subgroup with more other-group contact should be considered isolated (see Sorenson, 1975, 
pp. 362 and 363 ), but we include them here to avoid falsely dichotomizing cultures as “isolated from” versus “exposed to” one another (Fridlund, 
1994; Gewald, 2010). PNG = Papua New Guinea. 

^Specificity levels were not reported. *^Sorenson (1975), Sample 2, included three groups of Fore participants (those with little, moderate, and 
most other-group contact). The pattern of findings is nearly identical for the subgroup with the most contact and the data reported for the Fore 
in Ekman et al. (1969); Sorenson described using a free-labeling method, whereas Ekman et al. (1969) described using a choice-from-array 
method. *^Ekman (1994) indicated, however, that he did not use a free-labeling method, implying that the samples may be distinct. Participants 
were adolescents. ‘^Specificity was inferred from reported results. ^The sample size, marginal means, and exact pattern of errors reported for the 
Sadong samples are identical in Sorenson (1975), Sample 4, and Ekman et al. (1969); Sorenson described using a free-labeling method and Ekman 
et al. ( 1969 ) described using a choice-from-array method in which participants were shown photographs and asked to choose a label from a 
small list of emotion words. ^Traditional specificity and consistency tests are inappropriate for this method, but the results are placed here based 
on the original author’s interpretation of multidimensional scaling and clustering results. ^Participants were children. *^The Dani sample reported 
in Ekman (1972) is likely to be a subset of the data from an unpublished manuscript. ’Sorenson (1975), Sample 1 and Ekman and Friesen (1971) 
may be the same sample because the sample sizes and pattern of data are identical for all emotion categories except for the fear category, which 
is extremely similar, and for the disgust category, which includes responses for contempt in Ekman and Friesen (1971) but was kept separate in 
Sorenson (1975). 


configuration more often than chance only when the 
target and foil could be distinguished by the affective 
property referred to as valence. The finding that Hadza 
participants were successfully inferring pleasantness 


and unpleasantness is consistent with anthropological 
studies of emotion (Russell, 1991), linguistic studies 
(Osgood, May, & Miron, 1975), and findings from other 
recent studies of participants from small-scale societies, 
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a 


Bahinemo of Papua New Guinea 
(Sorenson, 1975) 

Sadong of Borneo^ 
(Sorenson, 1975) 

Sadong of Borneo^ 

(Ekman etal.,1969) 

Dani of lndonesiai> 

(Western New Guinea) 

(Ekman, 1972) 

Dani of Indonesia 
(Western New Guinea)i> 

(Ekman, Heider, etai., unpubiished) 



Fore of Papua New 
Guinea^ 

(Ekman, Sorenson & 
Friesen, 1969) 


Fore of Papua New 
Guinea<i 

(Ekman & Freisen, 
1971) 


Fore of Papua New Guinea 
(2 sampies)^.'! 

(Sorenson, 1975) 


b 


Shuar of Amazonian Ecuador 
(Bryant & Barrett, 2008) 


Diouia of Burkina Faso 
(Tracy & Robins, 2008) 




Fiimba of Namibia 
(2sampies) ^ 
(Gendron etai., 2014) 


Trobrianders of Papua New Guinea 
Mwani of Mozambique (Criveiii, Jariiio, et ai., 2016) 

(Criveiii, Jariiio, et ai., 2016) 

Trobrianders of Papua 
New Guinea (2 sampies) 

(Criveiii, Russeli, etai., 2016) 

Trobrianders of Papua 
New Guinea 
(2 sampies) 
(Criveiii, Russeil, et ai., 2017) 




Fig. 12. Map of cross-cultural studies of emotion perception in small-scale societies. People in small-scale societies typically live in 
groupings of several hundred to several thousand people who maintain autonomy in social, political and economic spheres, (a) Epoch 
1 studies, published between 1969 and 1975, were geographically constrained to societies in the South Pacific. Studies that share the 
same superscript letter may share the same samples, (b) Epoch 2 studies, published between 2008 and 2017, sample from a broader 
geographic range including Africa and South America and are more diverse in the ecological and social contexts of the societies tested. 
This type of diversity is a necessary condition for discovering the extent of cultural variation in p.sychological phenomena (Medin, 
Ojalehto, Marin, & Bang, 2017). Adapted from Gendron, Criveiii, and Barrett (2018). 


such as the FTimba (Gendron, Roberson, van der Vyver, 
& Barrett, 2014a, 2014b) and the Trobriand Islanders 
(Criveiii, Jariiio, et ah, 2016; also see Srinivasan & 
Martinez, 2018, described in Box 7 in the Supplemental 
Material); these studies showed that perceivers can reliably 


infer valence but not arousal in facial configurations. 
In addition, Hadza participants who had some contact 
with people from other cultures—they had some formal 
schooling or could speak Swahili, which is not their 
native language—were more consistently able to choose 
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the hypothesized facial configuration than were those 
with no formal schooling who spoke minimal Swahili 
(for a similar finding with Fore participants in a free- 
labeling study, see Table 2 in Sorenson, 1975). Of the 
27 Hadza participants who had minimal contact with 
other cultures, only 12 reliably chose the wide-eyed, 
gasping facial configuration at above chance levels to 
match the fear story. (Compare this finding with the 
observation that the hypothesized universal expression 
for fear—a wide-eyed, gasping facial configuration—is 
understood as an aggressive, threatening display by 
Trobriand Islanders; Crivelli, Jarillo, & Fridlund, 2016; 
Crivelli, Russell, Jarillo, & Fernandez-Dols, 2016, 2017). 

Studies that measure emotion perception with free- 
labeling tasks. During the period from 1969 to 1975, 
between one and three small-scale samples from remote 
cultures in the South Pacific were studied with free label¬ 
ing to investigate emotion perception (three samples 
were reported in Sorenson, 1975; see Table 5 in the cur¬ 
rent article). From 2008 onward, two additional studies 
were conducted, one asking participants from the Trobri¬ 
and Islands to infer emotions in photographs of sponta¬ 
neous facial configurations (Crivelli et al., 2017, Study 1) 
and the other asking Hadza participants to infer emotions 
in photographs of posed facial configurations (Gendron 
et al., 2018, Study 2). Taken together, these five studies 
provide little evidence that the facial configurations in 
Figure 4 are universally judged to specifically express 
certain emotion categories. The three free-labeling stud¬ 
ies reported in Sorenson (1975) produced variable results. 
The only replicable finding appears to be that partici¬ 
pants labeled smiling facial configurations uniquely as 
happiness in all studies (as the only pleasant emotion 
category tested). The two newer free-labeling studies 
both indicated that participants rarely spontaneously 
labeled facial configurations with the expected emotion 
labels (or their synonyms) above chance levels. Trobriand 
Islanders did not label the proposed facial configurations 
for happiness, sadness, anger, surprise, or disgust with the 
expected emotion labels (or their synonyms) at above 
chance levels (although they did label the faces consis¬ 
tently with other words; Crivelli et al., 2017, Study 1). 
Hadza participants labeled smiling and scowling facial 
configurations as happiness (44%) and anger (65%), 
respectively, at above chance levels (Gendron, Hoemann, 
et al., 2018, Study 2). The word anger was not used to 
uniquely label scowling facial configurations, however, 
and it was frequently applied to frowning, nose-wrinkled, 
and gasping facial configurations. 

Facial movements carry meaningful information, 
even if they do not reliably and specifically display 
emotional states. The more recent studies of people 


living in small-scale, remote cultures suggest two interest¬ 
ing and noteworthy observations. First, even though peo¬ 
ple may not routinely infer anger from scowls, sadness 
from frowns, and so on, they do reliably infer other social 
meanings for those facial configurations, because facial 
movements often carry important information about 
social motives and other psychological features (Crivelli, 
Jarillo, Russell, & Fernandez-Dols, 2016; Crivelli et al., 
2017; Rychlowska et al., 2015; Wood, Rychlowska, & 
Niedenthal, 2016; Yik & Russell, 1999; for a discussion, 
see Fridlund, 2017; J. Martin et al., 2017). For example, as 
we mentioned earlier, Trobriand Islanders consistently 
labeled wide-eyed, gasping faces (the proposed expres¬ 
sive facial configuration for the fear category) as signal¬ 
ing an intent to attack (i.e., a threat; for additional 
evidence in carvings and masks in a variety of cultures, 
including Maori, !Kung Bushmen, Himba, and Eipo, see 
Crivelli, Jarillo, & Fridlund, 2016; Crivelli, Jarillo, Russell, 
& Fernandez-Dols, 2016). 

Second, people do not always infer internal psycho¬ 
logical states (emotions or otherwise) from facial move¬ 
ments. People who live in non-Western cultural 
contexts, including Himba and Hadza participants, are 
more likely to assume that other people’s minds are not 
accessible to them, a phenomenon called opacity of 
mind in anthropology (Danziger, 2006; Robbins & 
Rumsey, 2008). Instead, facial movements are perceived 
as actions that predict future actions in certain situa¬ 
tions (e.g., a wide-eyed, gasping face is labeled as 
“looking”; Crivelli et al., 2017; Gendron, Hoemann, 
et al., 2018; Gendron et al., 2014b). Similar observations 
were unavailable for the earlier studies conducted by 
Ekman, Friesen, and Sorenson because, according to 
Sorenson (1975), they directed participants to provide 
emotion terms. When participants spontaneously 
offered an action label (e.g., “she is just looking”) or a 
social evaluation (e.g., “he is ugly,” or “he is stupid”), 
they were asked to provide an “affect term.” Such find¬ 
ings suggest that there may be profound cultural varia¬ 
tion in the type of inferences human perceivers typically 
make when looking at other human faces in general, 
an observation that has been raised by a number of 
anthropologists and historians. 

A note on interpreting the data. To properly inter¬ 
pret the scientific evidence, it is crucial to consider the 
constraints placed on participants by the experimental 
tasks that they are asked to complete, summarized in Table 
4. In most urban and in some remote samples, experiments 
using choice-from-array tasks produce evidence support¬ 
ing the common view: Participants reliably label scowling 
facial configurations as angry, smiling facial configurations 
as happy, and so on. (We do not yet know whether per¬ 
ceivers are uniquely labeling each facial configuration as a 
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specific emotion because most studies do not report that 
information.) 

It has been known for almost a century that choice- 
from-array tasks help participants obtain a level of reli¬ 
ability in their emotion perceptions that is not routinely 
seen in studies using methods that allow participants 
to respond more freely, and this is one reason they 
were chosen for use in the first place (for a discussion, 
see Gendron & Barrett, 2009, 2017; Russell, 1994; Widen 
& Russell, 2013). When participants are offered words 
for happiness, fear, surprise, anger, sadness, and disgust 
to register their inferences for a scowling facial configu¬ 
ration, they are prevented from judging a face as 
expressing other emotion categories (e.g., confusion or 
embarrassment), nonemotional mental states (e.g., a 
social motive, such as rejection or avoidance), or physi¬ 
cal events (e.g., pain, illness, or gas), thus inflating 
reliability rates within the task. When people are pro¬ 
vided with other options, they routinely choose them. 
For example, participants label scowling faces as “deter¬ 
mined” or “puzzled,” wide-eyed faces as “hopeful,” and 
gasping faces as “pained” when they are provided with 
stories about those emotions rather than with stories 
of anger, surprise, and fear (Carroll & Russell, 1996; 
also see Crivelli et ah, 2017). The problem is not with 
the choice-from-array task per se—it is more with fail¬ 
ing to consider alternative explanations for the observa¬ 
tions in an experiment and therefore drawing 
unwarranted conclusions from the data. 

Choice-from-array tasks may do more than just limit 
response options, making it difficult to disconfirm 
common beliefs. The emotion words provided during 
the task may actually encourage people to see anger 
in scowls, sadness in pouts, and so on, or to learn 
associations between a word (e.g., anger) and a facial 
configuration (e.g., a scowl) during the experiment 
(e.g., Gendron, Roberson, & Barrett, 2015; Hoemann 
et ah, in press). The potency of words is discussed in 
Box 14, in the Supplemental Material. 

Summary. The pattern of findings from the studies con¬ 
ducted with remote samples replicates and underscores 
the pattern observed in samples of participants from 
larger, more urban cultural contexts: Asking perceivers to 
infer an emotion by matching a facial configuration to an 
emotion word selected from a small array of options, or 
telling participants a brief story about a typical instance 
of an emotion category and asking them to pick a facial 
configuration from an array of two or three photos, gen¬ 
erally inflates agreement rates, producing evidence that is 
more likely to support the hypothesis of reliable emotion 
perception compared with data coming from less con¬ 
strained response methods, such as free labeling (see 


Table 3). This is particularly true for studies that include 
only one pleasant emotion category (i.e., happiness) so 
that all foils differ from the target in valence. The robust 
reliability and specificity for inferring happiness from 
smiling observed in these studies may be the result of 
participants classifying valence rather than classifying 
emotion categories per se. Studies that use less con¬ 
strained tasks that are designed to more freely discover 
how people perceive emotion instead yield evidence that 
generally fails to find support for the common view. Less 
constrained studies suggest that perceivers infer more 
than one emotion category from the same facial configu¬ 
ration, infer the same emotion category in a variety of 
different configurations, and often disagree about the set 
of emotion categories that they infer. Cultural variation in 
emotion perception is consistent with the variation we 
observed in studies of expression production (again, see 
Table 3) and is even consistent with the research on face 
perception, which itself is determined by experience and 
cultural factors (Caldara, 2017). 

Studies of healthy infants and children 

Some scientists concur with the common view that 
infants can read specific instances of emotion in faces 
from birth (Flaviland & Lelwica, 1987; Izard, Woodburn, 
& Finlon, 2010; Leppanen & Nelson, 2009; Walker- 
Andrews, 2005). However, it is difficult to ascertain 
whether infants and young children possess the various 
capacities required to perceive emotion per se: Simply 
detecting and discriminating facial movements is not 
the same as categorizing them to infer their emotional 
meaning. It is challenging to design well-controlled 
experiments that do a good job of distinguishing these 
two capacities. Infants are preverbal, so scientists use 
other measurement techniques, such as the amount of 
time an infant looks at a stimulus, to infer whether 
infants can discriminate one facial configuration from 
another, and ultimately, whether infants categorize 
those configurations as emotionally meaningful (for a 
brief explanation, see Box 15 in the Supplemental 
Material). 

This “looking” approach introduces several possible 
confounds because of the stimuli used in the experi¬ 
ments: Infants and children are typically shown photo¬ 
graphs of the proposed expressive forms (similar to 
those presented in Figure 4; e.g., Leppanen, Richmond, 
Vogel-Farley, Moulson, & Nelson, 2009; Peltola, 
Leppanen, Palokangas, & Hietanen, 2008). Infants are 
more familiar with some of these configurations than 
with others (e.g., most infants are more familiar with 
smiling faces than with scowls or frowns), and familiar¬ 
ity is known to influence perception (see Box 15, in 
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the Supplemental Material), making it difficult to know 
which features of a face are holding an infant’s attention 
(familiarity or novelty) and which might be the basis 
of categorization in terms of emotional meaning. The 
configurations proposed for each emotion category also 
differ in their perceptual features (e.g., the proposed 
expressions for fear and surprise contain widened eyes, 
whereas the proposed expression for sadness does 
not), contributing more ambiguity to the interpretation 
of findings. For example, when an infant discriminates 
smiling and scowling facial configurations, it is tempt¬ 
ing to infer that the child is discriminating expressions 
of anger and happiness when in fact that target of 
discrimination may be the presence or absence of 
teeth in a photograph (Caron, Caron, & Myers, 1985). 
Moreover, the facial configurations in question are usu¬ 
ally posed as exaggerated facial movements that are 
not typical of the expressive variation that children 
actually observe in their everyday lives (Grossmann, 
2010). Furthermore, unlike adults, infants may have had 
little or no experience with viewing photographs of 
anything, including heads of people with no bodies 
and no context. 

The most important and pervasive confound in 
developmental studies of emotion perception is that 
most studies are not designed to distinguish between 
whether infants and children (a) discriminate facial con¬ 
figurations according to their emotional meaning and 
whether they (b) discriminate affective features (pleas¬ 
ant vs. unpleasant; high arousal vs. low arousal; see 
Box 9 in the Supplemental Material). Often, a facial 
configuration that is intended to depict a pleasant 
instance of emotion (smiling in happiness) is compared 
with one that is intended to depict an unpleasant 
instance of emotion (e.g., scowling in anger, frowning 
in sadness, or gasping in fear), or these configurations 
are compared with a neutral face at rest (e.g., Leppanen 
et ah, 2007, 2009; Montague & Walker-Andrews, 2001). 
(This problem is similar to the one encountered earlier 
in our discussion of emotion-perception studies in 
adults from small-scale societies, in which perceptions 
of valence can be confused with perceptions of emo¬ 
tion categories.) For example, in one study, 16- to 
18-month-olds preferred toys paired with smiling 
faces and avoided toys paired with scowling and 
gasping faces (N. G. Martin, Maza, McGrath, & Phelps, 
2014); this type of study cannot distinguish whether 
infants are differentiating pleasant from unpleasant, 
approach versus avoidance, or something about a 
specific emotion. 

Another study (Soken & Pick, 1999) reported that 
7-month-olds distinguished sadness and anger when 
looking at faces, but only when the faces were paired 


with vocalizations. What is unclear is the extent to 
which the level of arousal or activation conveyed in the 
acoustic signals were most salient to infants. A recent 
study suggested that 10-month-old infants can differ¬ 
entiate between the high arousal, unpleasant scowling 
and nose-wrinkled facial configurations that are pro¬ 
posed as expressions of anger and disgust, suggesting 
that they can categorize these two facial configurations 
separately (Ruba et ah, 2017). Yet the scowling and 
nose-wrinkled facial configurations also differed in 
properties besides their proposed emotional meaning: 
scowling faces showed no teeth, but nose-wrinkled 
faces were toothy, and it is well known that infants use 
perceptual features such as “toothiness” to categorize 
faces (see Caron et ah, 1985). If an infant looks longer 
at a (pleasant) smiling facial configuration after viewing 
several (unpleasant) scowling faces, this does not nec¬ 
essarily mean that the infant has discriminated between 
and understands “happiness” and “anger”; the infant 
might have discriminated positive from negative, affec¬ 
tive from neutral, familiar from novel, the presence of 
teeth from the absence, less eye sclera from more, or 
even different amounts of contrast in the photographs. 
In the future, to provide a sound basis to infer that 
infants are processing specific emotional meaning, 
experiments must be designed to rule out the possibility 
that infants are categorizing facial configurations into 
different groupings using factors other than emotion. 

As a consequence of these confounds, there is still 
much to learn about the developmental course of 
emotion-perception abilities. By 3 months of age, 
infants can distinguish the facial features (the morphol¬ 
ogy) in the proposed expressive configurations for hap¬ 
piness, surprise, and anger; by 7 months, they can 
discriminate the features in proposed expressive con¬ 
figurations for fear, sadness, and interest. Left uncertain 
is whether, beyond just discriminating between the 
mere appearance of particular facial features, infants 
also understand the emotional meaning that is typically 
inferred from those features within their culture. By 7 
months of age, infants can reliably infer whether some¬ 
one is feeling pleasant or unpleasant when facial con¬ 
figurations are accompanied by sensory information 
from the voice (Flom & Bahrick, 2007; Walker-Andrews 
& Dickson, 1997). Only a handful of studies have 
attempted to test whether infants can infer emotional 
meaning in facial configurations rather than just dis¬ 
criminating between faces with different physical 
appearances, but they report conflicting results 
(Schwartz, Izard, & Ansul, 1985; Serrano, Iglesias, & 
Loeches, 1992). One promising future direction involves 
measuring the electrical signals (event-related poten¬ 
tials) in infant brains as they view the proposed 
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expressive configurations for anger and fear categories 
(e.g., Hoehl & Striano, 2008; Kobiella, Grossmann, Reid, 
& Striano, 2008). Both of these studies reported dif¬ 
ferential brain responses to the proposed facial con¬ 
figurations for anger and fear, but their findings did not 
replicate one another (and for certain measurements, 
they observed opposing effects; for a broader review, 
see Grossmann, 2015). 

Studies that measure a child’s ability to use an adult 
caregiver’s facial movements to resolve ambiguous or 
threatening situations, referred to as social referencing, 
have been interpreted as evidence of emotion percep¬ 
tion in infants. One-year-olds use social referencing to 
stay in close physical proximity to a caregiver who is 
expressing negative affect, whereas infants are more 
likely to approach novel objects if the caregiver 
expresses positive affect (Carver & Vaccaro, 2007; 
Moses, Baldwin, Rosicky, & Tidball, 2001; Saarni, 
Campos, Camras, & Witherington, 2006). Similar results 
emerge from the caregiver’s tone of voice (Hertenstein 
& Campos, 2004; Mumme, Fernald, & Herrera, 1996). 
In fact, by 14 months of age, the positive or negative 
tone of a caregiver’s voice influences what an infant 
will touch even more than will a caregiver’s facial move¬ 
ments or the content of what the adult is actually saying 
(Vaish & Striano, 2004; Vaillant-Molina & Bahrick, 2012). 
These studies clearly suggest that infants can infer the 
valenced meaning of facial movements, at least when 
made by live (as opposed to virtual) people with whom 
they are familiar. But, again, these data do not help 
resolve what, if anything, infants infer about the emo¬ 
tional meaning of facial movements. 

Learning to perceive emotions. Children grow up in 
emotionally rich social environments, making it difficult 
to run experiments that are capable of testing the com¬ 
mon view of emotion perception while also taking into 
account the possible roles for learning and social experi¬ 
ence. Nonetheless, several themes have emerged in the 
scientific literature, all of which suggest a clear role for 
learning and context in children’s developing emotion- 
perception capacities. 

One hypothesis that continues to be strongly sup¬ 
ported by experiments is that children’s capacity to 
infer emotional meaning in facial movements depends 
on context (the conditions surrounding the face that 
may convey information about a face’s meaning). For 
example, emotion-concept learning, as a potent source 
of internal context, shapes emotion-perception capacity 
(discussed in Boxes 10 and 16 in the Supplemental 
Material). There are also developmental changes in how 
people use context to shape their emotional inferences 
about facial movements. Children as young as 19 
months old can detect facial movements that 


are emotionally incongruent with a context (Walle & 
Campos, 2014). For example, when presented with 
adult facial configurations that are placed on bodies 
posing an emotional context (e.g., a scowling facial 
configuration placed on a body holding a soiled dia¬ 
per), children (ages 4, 8, and 12 years) moved their 
eyes back and forth between faces and bodies when 
deciding how to label the emotional meaning of the 
faces, whereas adult participants directed their gaze 
(and overt visual attention) to the face alone, judging 
its emotional meaning in a way that was independent 
of the bodily context (Leitzke & Poliak, 2016). The 
youngest children were equally likely to base their 
labeling of the scene on face or context. The results of 
this experiment suggest that younger children devote 
greater attention to contextual information and actively 
cross-reference facial and contextual cues, presumably 
to better learn about and understand the emotional 
meaning those cues.^® 

Another important source of context that shapes the 
development of emotion perception in children involves 
the broader environment in which children grow. Chil¬ 
dren who grow up in neglectful or abusive environ¬ 
ments in which their emotional interactions with 
caregivers are highly atypical have a different develop¬ 
mental trajectory than do those growing up in more 
consistently nurturing environments (Bick & Nelson, 
2016; Poliak, 2015). Parents from these high-risk fami¬ 
lies produce unclear or context-inconsistent expres¬ 
sions of emotion (Shackman et ah, 2010). Neglected 
children, who often do not receive sufficient social 
feedback, show delays in perceiving emotions in the 
ways that adults do (Camras, Perlman, Fries, & Poliak, 
2006; Poliak et ah, 2000), whereas children who are 
physically abused learn to preferentially attend to and 
identify facial movements that are associated with 
threat, such as a scowling facial configuration (Briggs- 
Gowan et ah, 2015; Cicchetti & Curtis, 2005; da Silva 
Ferreira, Crippa, & de Lima Osorio, 2014; Poliak, Vardi, 
Bechner, & Curtin, 2005; Shackman & Poliak, 2014; 
Shackman, Shackman, & Poliak, 2007). Abused children 
require less perceptual information to infer anger in a 
scowling configuration (Poliak & Sinha, 2002) and more 
reliably track the trajectory of facial muscle activations 
that signal threat (Poliak, Messner, Kistler, & Cohn, 
2009). Children raised in physically abusive environ¬ 
ments also more readily infer anger and threat in ambig¬ 
uous facial configurations (Poliak & Kistler, 2002) and 
then require greater effortful control to disengage their 
attention from signs of threat (Poliak & Tolley-Schell, 
2003) compared with children who have not been mal¬ 
treated. This close attention to scowling faces with knit¬ 
ted eyebrows shapes how abused children understand 
what facial movements mean. For example, one study 
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found that 5-year-old abused children tended to believe 
that almost any kind of interpersonal situation could 
result in an adult becoming angry; by contrast, most 
nonabused children understand that anger is likely to 
be particular to interpersonal circumstances (Perlman, 
Kalish, & Poliak, 2008). 

By 3 years of age, North American children not only 
start to show reliability in their emotion perceptions 
but also begin to show evidence of specificity. They 
understand that facial movements do not necessarily 
map on to emotional states, and how someone really 
feels can be faked or masked. Moreover, they know 
what facial movements are expected in a particular 
context and try to produce them despite their feelings. 
For example, the “disappointing gift” experiments 
developed by psychologist Pamela Cole and her col¬ 
leagues demonstrate this well. In one study, preschool- 
age children were told they would be rewarded with a 
gift after they completed a task. Later, children received 
a beautifully wrapped package that contained a disap¬ 
pointing item, such as a broken pair of cheap sun¬ 
glasses. When facing a smiling unfamiliar adult who 
had presented them with a gift, children forced them¬ 
selves to smile (lip corner pull, cheek raise, and brow 
raise) and to thank the experimenter. Yet, although the 
children were smiling, they often kept their eyes 
focused down, slumped their shoulders, and made 
negative statements about the object, indicating that 
they did not, in fact, feel positive about the situation 
(Cole, 1986). Moreover, there was no difference in the 
behavioral responses of visually impaired children 
when receiving a disappointing gift (Cole, Jenkins, & 
Shott, 1989). Studies like this one provide a more 
implicit way of assessing children’s knowledge about 
emotion perception (i.e., it illustrates the inferences 
that children expect others to make from their own 
facial movements). 

It is possible that the frequency and type of facial 
input that people encounter influences their emotion 
categorizations. To test whether the statistical distribu¬ 
tion of emotion input would influence how people 
construed boundaries between emotion categories, 
Plate, Wood, Woodard, and Poliak (2018) manipulated 
the frequency of this information to perceivers. Partici¬ 
pants were asked to categorize facial morphs (from 
neutral to scowling) as being either “calm” or “upset.” 
A third of participants saw more scowling faces, a third 
saw more neutral faces, and the others saw faces that 
were equally distributed across scowling and neutral. 
Both school-age children and adults adjusted their emo¬ 
tion categories based on the frequency of the input 
they encountered. Those exposed to more scowling 
faces increased their threshold for categorizing a face 
as upset (therefore narrowing their category of “anger”). 


Those exposed to more calm faces decreased their 
threshold for categorizing a face as angry. These data 
are consistent with the idea that the frequency or com¬ 
monness of a facial configuration in an observer’s envi¬ 
ronment influences his or her conception of an emotion 
(Levari et ah, 2018; Oakes & Ellis, 2013), as well as the 
more general findings that expertise with faces influ¬ 
ences identity perception (Beale & Keil, 1995; Jenkins, 
White, Monfort, & Burton, 2011; McKone, Martini & 
Nakayama, 2001; Viviani, Binda & Bosato, 2007; for a 
discussion of how familiarity is important for face per¬ 
ception, see Young & Burton, 2017). As a result, indi¬ 
vidual differences in emotion perception may be 
influenced by early experience that differs according 
to emotional input, reflecting the malleability of these 
categories. 

Summary. There is currently no clear evidence to sup¬ 
port the hypothesis that infants and young children reli¬ 
ably and specifically infer emotion in the proposed 
expressive configurations for the anger, disgust, fear, 
happiness, sadness, and surprise categories (findings 
summarized in Table 3). A more plausible interpretation 
of the existing evidence is that young infants infer affec¬ 
tive meaning, such as valence and arousal, from facial 
configurations. Data from infants and young children 
obtained using a variety of methods further suggest that 
emotion-perception abilities emerge and are shaped 
through learning in a social environment. These findings 
are consistent with the idea that the human face plays an 
important and privileged role to communicate impor¬ 
tance or salience. But it is not clear that the expressive 
configurations proposed for specific emotion categories 
are similarly privileged in this way. 

Summary of scientific evidence on the 
perception of emotion in faces 

The scientific findings on perception studies generally 
replicate those from production studies in failing to 
strongly support the common view. The one exception 
to this overall pattern of findings is seen in studies that 
ask participants to match a posed face to an emotion 
word or brief scenario. This method produces evidence 
that can support the common view, even when it is 
applied to completely novel emotion categories with 
made-up expressive cues (Hoemann et al., 2018), open¬ 
ing up interesting questions about the psychological 
potency of the elements that make up choice-from- 
array designs (e.g., the emotion words embedded in 
the task or the choice of foils on a given trial). These 
findings reinforce our earlier conclusion that such terms 
as “facial configuration” or “pattern of facial move¬ 
ments” or even “facial actions” are preferred to more 



46 


Barrett et al. 


loaded terms such as “emotional facial expression,” 
“emotional expression,” or “emotional display,” which 
can be misleading at best and incorrect at worst. 

Summary and Recommendations 

Evaluation of the empirical evidence 

The common view that humans around the world reli¬ 
ably express and recognize certain emotions in specific 
configurations of facial movements continues to echo 
within the science of emotion, even as scientists increas¬ 
ingly acknowledge that anger, sadness, happiness, and 
other emotion categories are more variable in their 
facial expressions. This entrenched common view does 
more than guide the practice of science. It influences 
public understanding of emotion and hence education, 
clinical practice, and applications in industry. Indeed, 
it reaches into almost every facet of modern life, includ¬ 
ing emoticons and movies. However, there is insuffi¬ 
cient evidence to support it. People do express instances 
of anger, disgust, fear, happiness, sadness, and surprise 
with the hypothesized facial configurations presented 
in Figure 4 at above chance levels, suggesting that those 
facial configurations sometimes serve as expressions of 
emotion as proposed. However, the reliability of this 
finding is weak, and there is evidence that the strength 
of support for the common view varies systematically 
with the research methods used. The strongest support 
for the common view—found in data from urban, 
industrialized, or developed samples completing 
choice-from-array tasks—does not show robust gener- 
alizability. Evidence for specificity is lacking in almost 
all research domains. A summary of the scientific evi¬ 
dence is presented in Table 3. 

These research findings do not imply that people 
move their faces randomly or that the configurations 
in Figure 4 have no psychological meaning. Instead, 
they reveal that the facial configurations in question 
are not “fingerprints” or diagnostic displays that reliably 
and specifically signal particular emotional states 
regardless of context, person, and culture. It is not pos¬ 
sible to confidently infer happiness from a smile, anger 
from a scowl, or sadness from a frown, as much of 
current technology tries to do when applying what are 
mistakenly believed to be the scientific facts. 

Instead, the available evidence from different popu¬ 
lations and research domains—infants and children, 
adults living in industrialized countries and in remote 
cultures, and even individuals who are congenitally 
blind—overwhelmingly points to a different conclusion: 
When facial movements do express emotional states, 
they are considerably more variable and dependent on 
context than the common view allows. There appear 


to be many-to-many mappings between facial configu¬ 
rations and emotion categories (e.g., anger is expressed 
with a broader range of facial movements than just a 
scowl, and scowls express more than anger). A scowl¬ 
ing facial configuration may be an expression of anger 
in the sense of being a part of anger in a given instance. 
But a scowling facial configuration is not the expression 
of anger in any generalizable or universal way (there 
appear to be no prototypical facial expressions of emo¬ 
tions). Scowling facial configurations and the others in 
Figure 4 belong to a much larger repertoire of facial 
movements that express more than one emotion cate¬ 
gory, and also nonemotional psychological meanings, 
in a way that is tailored to specific situations and cul¬ 
tural contexts. The face is a powerful tool for social 
communication (Jack & Schyns, 2017). Facial move¬ 
ments, like reflexive and voluntary motor movements 
(L. F. Barrett & Finlay, 2018), are strongly context- 
dependent. Recent evidence suggests that people’s cat¬ 
egories for emotions are flexible and responsive to the 
types and frequencies of facial movements to which 
they are exposed in their environments (Plate, Wood, 
Woodard, & Poliak, 2018). 

The degree of variation suggested by the published 
evidence goes well beyond the hypothesis that the 
facial configurations in Figure 4 are prototypes or typi¬ 
cal expressions and that any observed variations are 
merely the result of cultural accents, display rules, sup¬ 
pression or other regulatory strategies, differences in 
induction methods, measurement error, or stochastic 
noise (as proposed by various scientists, including 
Ekman & Cordaro, 2011; Elfenbein, 2013, 2017; 
Levenson, 2011; Matsumoto, 1990; Roseman, 2001; 
Tracy & Randles, 2011). Instead, the facial configura¬ 
tions in Eigure 4 are best thought of as Western ges¬ 
tures, symbols or stereotypes that fail to capture the 
rich variety with which people spontaneously move 
their faces to express emotions in everyday life. A ste¬ 
reotype is not a prototype. The distinction is an impor¬ 
tant one, because a prototype is the most frequent or 
typical instance of a category (Murphy, 2002), whereas 
a stereotype is an oversimplified belief that is taken as 
generally more applicable than it actually is. 

The conclusion that emotional expressions are more 
variable and context-dependent than commonly 
assumed is also mirrored by the evidence from physi¬ 
ological changes (e.g., heart-rate and skin-conductance 
measures; see Box 8 in the Supplemental Material) and 
even in evidence on the brain basis of human emotion 
(Clark-Polner et al., 2017). The task of science is to 
systematically document these context-dependent pat¬ 
terns, as well as to understand the mechanisms that 
cause them, so that we can explain and predict them. 
Clearly, the face is a rich source of information that 
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plays a crucial role in guiding social interaction. Facial 
movements, when measured in a high-dimensional 
dynamic context (i.e., in a deeply multivariate way, 
sampling across many measurement domains within an 
emoter and the spatiotemporal context), may serve the 
diagnostic purpose that many consumers of emotion 
science are looking for (in which context can be a 
cultural context, a specific situation, a person’s learning 
history or momentary physiological state, or even the 
temporal context of what just took place a moment 
ago; L. F. Barrett, 2017b; L. F. Barrett, Mesquita, & 
Gendron, 2011; Gendron, Mesquita, & Barrett, 2013)- 

A note on the scientific literature 

Our review identified several broad problems that lurk 
within the scientific research on facial expressions and 
that may cause considerable misunderstanding and con¬ 
fusion for consumers of this research. First, statistical 
standards are commonly adopted that do not translate 
well for applying emotion research to other domains, 
applied or scientific. Showing that people frown when 
sad or scowl when angry with greater statistical reli¬ 
ability than would be expected by chance may be a 
scientific finding that warrants publication in a peer- 
reviewed journal, but above-chance responding is often 
low in absolute terms, making broad conclusions 
impossible, particularly for translation to domains of 
life in which a person’s outcomes can be influenced by 
the emotional meaning that perceivers infer. Making 
inferences on the basis of statistical reliability without 
properly accounting for actual effect sizes, specificity, 
and generalizability, is similarly problematic. 

Second, even studies that surmount these common 
shortcomings often have a mismatch between what is 
claimed in their conclusions (or what others claim in 
reviews or citations of those primary research studies) 
and what inferences can, in fact, be reasonably sup¬ 
ported by the results. This is particularly problematic, 
because the perpetuation of the common view, and its 
applications, may be the result of superficial readings 
of abstracts or secondary sources rather than in-depth 
evaluation of the primary research. 

Third, the mismatch between observations and inter¬ 
pretations often results from problems in how studies 
are designed—the particular stimuli used, the tasks used, 
and the statistical analyses are critically important and 
constrain what can be observed and inferred in the first 
place. Unfortunately, the published research on emo¬ 
tional expressions and emotion perception is rarely 
designed to systematically assess the degree of expres¬ 
sive variation. Furthermore, this research often con¬ 
founds the measurements made in an experiment with 
the interpretation of those measurements, referring to 
facial movements as “emotional displays,” “emotional 


expressions,” or even “facial expressions” rather than 
“facial configurations,” “facial movements,” or “facial 
actions”; referring to people “detecting” or “recognizing” 
emotion rather than “perceiving” or “inferring” an emo¬ 
tional state on the basis of some set of cues (facial move¬ 
ments, vocal acoustics, body posture, etc.); and referring 
to “accuracy” rather than “agreement” or “consensus.” 

A note on other emotion categories 

Our conclusions most directly challenge what we have 
termed the “common view”: that a scowling facial con¬ 
figuration is the expression of anger, a nose-wrinkled 
facial configuration is the expression of disgust, a gasp¬ 
ing facial configuration is the expression of fear, a smil¬ 
ing facial configuration is the expression of happiness, 
a frowning facial configuration is the expression of 
sadness, and that a startled facial configuration is the 
expression of surprise (see Fig. 4). By necessity, we 
focused our review of evidence on these 6 emotion 
categories, rather than the more than 20 emotion cat¬ 
egories that are currently being studied, because studies 
on these 6 are far more numerous than studies of other 
emotion categories. Nonetheless, some scientists claim 
that each of these other emotion categories has a pro¬ 
totypical, universal expression, facial or otherwise, that 
is modified or accented by culture (e.g., Cordaro et ah, 
2018; Keltner et ah, 2019). In our view, such claims rest 
on evidence that is subject to the same critique that we 
offered for the research reviewed in detail here. In 
short, even though our review focused on the six emo¬ 
tion categories that are sometimes referred to as “basic 
emotions,” our observations and conclusions generalize 
to studies of other emotion categories that use similar 
methods. 

Recommendations for consumers of 
emotion research on applying the 
scientific findings 

Presently, many consumers of emotion research assume 
that certain questions about emotional expressions have 
been answered satisfactorily when in fact this is not the 
case. Technology companies, for example, are spending 
millions of research dollars to build devices to read 
emotions from faces, erroneously taking the common 
view as a fact that has strong scientific support. A more 
accurate description, however, is that such technology 
detects facial movements, not emotional expressions.^^ 
Corporations such as Amazon are exploring virtual- 
human technology to interface with consumers. Virtual 
humans are used to educate children, train physicians, 
and train the military as well as infer psychological 
disorders, and perhaps will eventually even be used to 
offer treatments for psychiatric illnesses. At the moment, 
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Table 6. Recommendations for Reading Scientific Studies About Emotion 

1. Take note of whether an experiment is studying expressive stereotypes or more variable facial movements. 

2. Take note of data on specificity and generalizability; do not focus solely on reliability at above chance levels. 

3. Make a distinction between the data in an experiment (what was measured) and how those data are interpreted. 

4. Translate “emotional expressions” or “emotional displays” into “facial movements.” 

5. Translate “emotion recognition” into “emotion perception” or “emotion inference.” 

6 . Translate “accuracy” to “agreement,” “consensus,” or “reliability.” 

7. Give more weight to studies that measure facial movements or study the perception of facial movements in more naturalistic 
settings. 

8 . Take note of studies that measure or manipulate context. 

9. Field studies of people from small-scale, remote cultures are often less well-controlled than studies conducted in the 
laboratory, but they are invaluable in the information that they provide and should be appreciated. 

10. Remember that emotions are not understood as internal states in all cultures. In some cultures they are understood as situated 
actions. 

11. Do not skip the Method and Results sections and go directly to the Discussion to learn the results of an experiment. It is 
important to know what was measured and observed, not just how scientists interpreted their measurements. 


the science of emotion is ill-equipped to support any 
of these initiatives. So-called emotional expressions are 
more variable and context-dependent than originally 
assumed, and most of the published research was not 
designed to probe this variation and characterize this 
context dependence. As a consequence, as of right now, 
the scientific evidence offers less actionable guidance 
to consumers than is commonly assumed. 

In fact, our review of the scientific evidence indicates 
that very little is known about how and why certain 
facial movements express instances of emotion, par¬ 
ticularly at a level of detail sufficient for such conclu¬ 
sions to be used in important, real-world applications. 
To help consumers navigate the science of emotion, we 
offer some tips for how to read experiments and other 
scientific articles (Table 6). 

More generally, tech companies may well be asking 
a question that is fundamentally wrong. Efforts to sim¬ 
ply “read out” people’s internal states from an analysis 
of their facial movements alone, without considering 
various aspects of context, are at best incomplete and 
at worst entirely lack validity, no matter how sophisti¬ 
cated the computational algorithms. These technology 
developments are powerful tools to investigate the 
expression and perception of emotions, as we discuss 
below. Right now, however, it is premature to use this 
technology to reach conclusions about what people 
feel on the basis of their facial movements—which 
brings us to recommendations for future research. 

Recommendations for future scientific 
research 

Specific, concrete recommendations for future research 
to capitalize on the opportunity offered by current chal¬ 
lenges can be found in Table 7, but we highlight a few 
general points here. First, the expressive stereotypes 


that summarize the common view, such as those 
depicted in Figure 4, are ubiquitous in published 
research. It is time to move beyond a science of stereo¬ 
types to develop a science of how people actually move 
their faces to express emotion in real life, and the pro¬ 
cesses by which those movements carry information 
about emotion to someone else (a perceiver). (For a 
discussion of information theory as applied to emo¬ 
tional communication, see Box 16 in the Supplemental 
Material). The stereotypes of Figure 4 must be replaced 
by a thriving scientific effort to observe and describe 
the lexicon of context-sensitive ways in which people 
move their facial muscles to express emotion, and the 
discovery of when and how people infer emotions in 
other people’s facial movements. 

New research on emotion should consider sampling 
individuals deeply, with high dimensional measure¬ 
ments, across many different situations, times of day, 
and so forth: a Big Data approach to learning the 
expressive repertoires of individual people. The diag¬ 
nosis of an instance of emotion might be improved by 
combining many features, even those that are weakly 
diagnostic on their own, particularly if the analysis is 
conducted in a person-specific (idiographic) way (e.g., 
Rudovic, Lee, Dai, Schuller & Picard, 2018; Yin, et al., 
2018). In the ideal case, videos of people in natural 
situations could be quantified by automated algorithms 
for various physical features, such as facial movements, 
posture, gait, and tone of voice. To this, scientists 
could add the sampling of other physical features, 
such as ambulatory monitoring of ANS changes to 
sample the internal milieu of people’s bodies as they 
dynamically change over time, ambulatory eye-track¬ 
ing to assess gaze and attention, ambulatory brain 
imaging (e.g., electroencephalography), and optical 
brain imaging (e.g., functional near-infrared spectros¬ 
copy). Only a highly multivariate set of measures is 
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likely to work to classify instances of emotion with 
high reliability and specificity. 

The failure to find reliable “fingerprints” for emotion 
categories, including the lack of reliable facial move¬ 
ments to express these categories, may stem, at least in 
part, from the same source: Scientific approaches have 
ignored substantial, meaningful variability attributable 
to context (for recent computer innovations, see Kosti, 
Alvarez, Recasens & Lapedriza, 2017). There is Blue¬ 
tooth technology to capture the physical spaces people 
inhabit (which can be quantified for various structural 
and social descriptive features, such as the extent of 
their exposure to light and noise), whether they are 
with another person, how that person reacts, and so 
on. In principle, rich, multimodal observations could 
be available from videos; when time-synchronized with 
the other physical measurements, such video could be 
extremely useful in understanding the conditions when 
certain facial movements are made and what those 
movements might mean in a given context. Naturally, 
Big Data in the absence of hypotheses is not necessarily 
helpful. 

Participants could be offered the opportunity to 
annotate their videos with subjective ratings of the fea¬ 
tures that describe their experiences (whether or not 
they are identified as emotions). Candidate features are 
affective properties such as valence and arousal (see 
Box 9 in the Supplemental Material), appraisals (i.e., 
descriptions of how a situation is experienced; e.g., 
Clore & Ortony, 2013; see L. F. Barrett, Mesquita, Och- 
sner, & Gross, 2007; Gross & Barrett, 2011), and emo¬ 
tion-related goals. These additional psychological 
features have the potential to add higher dimensional 
details to more specifically characterize facial move¬ 
ments and what they mean."*® Such an approach intro¬ 
duces various technical and modeling challenges, but 
this sort of deeply inductive approach is now within 
reach. 

Another opportunity for high dimensional sampling 
of emotional events involves interactions with virtual 
humans. Because virtual humans can realize contingent 
behavior in rich social interactions under strict and 
precise experimental control, they can provide a richer, 
more natural context in which to study emotional 
expressions and emotion perception than may be true 
for traditional laboratory studies. In addition, they do 
not suffer from the loss of experimental control that 
limits causal inferences from ethological studies. 

To date, this potential has not yet been exploited to 
explore the reliability and specificity in context-sensitive 
relations between facial movements and mental events. 
As we noted earlier, most of the virtual systems are now 
designed to teach people a variety of skills, where the 
goal is not to assess how well participants perceive 
emotions in facial movements under realistic, socially 


ambiguous conditions, but instead to program expres¬ 
sive behaviors into virtual humans that will motivate 
people to learn the needed skills. In these experiments, 
the psychological realism of facial movements is often 
secondary to the primary goals of the experiment. A 
scientist might even program a virtual human with 
behavior or appearance that is unnatural or infeasible 
for a human (i.e., that are supernormal) so that a par¬ 
ticipant can unambiguously interpret and be influenced 
by the agent’s actions (for relevant discussion, see 
D. Barrett, 2010; Tinbergen, 1953). 

Nonetheless, the scientific approach of observing 
people as they interact with artificial humans holds 
great promise for understanding the dynamics and 
mechanisms of emotion perception and may get us 
closer to understanding human emotion perception in 
everyday life. Virtual humans are vivid. Unlike more 
passive approaches to evoking emotion, such as view¬ 
ing videos or images of facial configurations, a virtual 
human engages a human participant in a direct, social 
interaction to elicit perceptual judgments that are either 
directly reported or inferred from behaviors measured 
in the participant. Virtual humans are also highly con¬ 
trollable, allowing for precise experimentation 
(Blascovich et ah, 2002). A virtual human’s facial move¬ 
ments and other details can be repeated across partici¬ 
pants, offering the potential for robust and replicable 
observations. Numerous studies have demonstrated that 
humans are influenced by them (e.g., Baylor & Kim, 
2008; Krumhuber et ah, 2007; McCall, Blascovich, 
Young, & Persky, 2009). For example, human learners 
are more engaged by virtual agents who move their 
faces (and modulate their voices), leading the learners 
to an increased sense of self-efficacy (Y. Kim, Baylor, 
& Shen, 2007). As a consequence, virtual humans 
potentially allow for the study of emotion in a rich 
virtual ecology, a form of synthetic in vivo experimenta¬ 
tion (Marsella & Gratch, 2016). 

When combined with the high dimensional sampling 
we described earlier, there is the potential to revolution¬ 
ize our understanding of emotional expressions by ask¬ 
ing questions that are different from those encouraged 
by the common view. Automated algorithms using data 
captured from videos offer substantial improvements 
with a data-driven, unsupervised approach. The result 
could be robust descriptions about the context-sensitive 
nature of emotional expressions that is currently miss¬ 
ing, and that would set the stage for a more mechanis¬ 
tic, causal account of emotions and their expressions. 

An ethology of emotions and their expressions can 
also be pursued in the lab. Experiments can go beyond 
a study of how people move their faces in a single situ¬ 
ation chosen to be most typical of a given emotion 
category. Most studies to date have been designed to 
observe facial movements in only the most stereotypic 
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Table 7. Recommendations for Future Research 


General Recommendations 

• Take chances on studies that attempt to go beyond merely supporting the common view of emotion. 

• Support papers that attempt to study facial movements in real life, measuring context, sampling across cultures (even though 
these studies are often less well controlled than studies in the laboratory), or using facial stimuli that are less familiar to 
reviewers than common-view stimulus sets. 

• Prioritize multidisciplinary studies that combine classical psychology methods with cognitive neuroscience, machine learning, 
and so forth. 

• Support larger scale studies that bridge the lab and the world, that study individual people across many contexts, and that 
measure emotional episodes in high dimensional detail, including physical, psychological, and social features. 

• Support the development of computational approaches. 

• Create teams that pair psychologists and cognitive scientists trained in the psychology of emotion with engineers and computer 
scientists. 

• Increase opportunities to test innovative methods and novel hypotheses, with the acknowledgment that such approaches are 
likely to elicit resistance from established scientists in the field of emotion. 

• Generate more studies to identify the underlying neural mechanisms of the production and perception of facial movements. 

• Direct funding to thornier but necessary new questions and be critical of projects that perpetuate past errors in emotion 
research. 

• Direct healthy skepticism to tests, mea.sures, and interventions that rely on assumptions about “reading facial expressions of 
emotion” that seem to ignore published evidence and/or ignore integration of contextual information along with facial cues. 

• Develop systematic, precise ways to describe and/or manipulate the dynamics of specific facial actions. 

Problem Recommendation 


Limitations in stimulus selection can 
bias results. 


Little is known about the dynamics of 
emotional expression and emotion 
inferences. 


The role of context is hotly debated 
but rarely measured in sufficient 
detail. 


Cross-cultural studies can provide 
powerful insights but are limited in 
number and scope. 


Stimulus selection 

• For production studies, ensure that multiple stimuli per emotion category are used to 
evoke an emotion. 

• Measure emotional episodes in a multimodal way and attempt to discover explicit 
criteria for when an instance of emotion is present or absent. Such discovery may 
require within-person approaches. Consider quantifying the presence or absence 
probabilistically. 

• For perception studies, incorporate images from the wild (e.g., from multiple Internet 
sources) to capture the full range of facial movements that humans produce in their 
everyday lives. 

• For both production studies (in which stimuli are designated to evoke emotion) and 
perception studies, build variation into stimulus sets so conclusions about emotion 
categories are more generalizable to the real world. Consider randomly sampling a 
variety of stimuli for a given category and treating stimuli as a random variable. 

• For production studies, code the temporal dynamics of facial movements. 

• Attempt to determine the apex of facial movements, changes to AUs as movements 
emerge and recede, and whether the kinematics of distinct AUs are similar or different 
across sequences or phases of facial movements. 

• En.sure sufficient temporal resolution to allow for event segmentation to be assessed 
in perception studies. 

• For perception studies, use dynamic images rather than rely on still images. 

• Quantify, as best as possible, participants’ degree of exposure to and knowledge 
of Western cultural practices and norms, as well as the amount and type of formal 
schooling made available to participants. 

• Measure or manipulate the context in which facial movements occur. 

• Manipulate (or at least measure) the context in which facial movements are perceived 
to evaluate whether data are truly stimulus-specific or influenced by the entire scene 
(i.e., present facial movements in their spatiotemporal context). 

• Explicitly acknowledge that the experimental method is itself a psychologically potent 
context that is likely to influence responses. 

Sample selection 

• Harness technology to collect larger numbers of images and video sequences of facial 
movements across cultures. 

• Remember that emotions are understood as situated actions and not as mental events 
or feelings in many cultures; also, mental inferences may be understood differently in 
different cultures. 


(continued) 
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Table 7. (Continued) 


Measurement and interpretation 
of emotion are often blurred in 
research studies. 


New insights about emotion are 
constrained by reliance on, and 
assumptions about, traditional 
English emotion categories. 


Findings are limited by a failure to 
consider is.sues related to forward 
and reverse inference. 


Note: AU = action unit. 


Task, method, and design 

• Contrast more than one emotion category with a baseline, so that conclusions about a 
specific emotion category are not drawn from a comparison of an emotion condition 
and a no-emotion condition. 

• Compare multiple emotion categories to no-emotion categories in a given study. 

• Unless a study design is completely data-driven, explicitly state the theoretical priors 
of the research team. Consider stating whether a study is designed to discover new 
knowledge or test existing hypotheses about the traditional English categories. Both 
approaches are valid but should be clearly articulated. 

• Explicitly hypothesize how task design and method might influence responses. 

• Sample a broader number of emotion categories those used in prior research (move 
beyond the 20 or so English categories that are now becoming common in research). 
Consider sampling non-English emotion categories. Test for variations within these 
categories and similarity across categories. 

Data analysis and interpretation 

• Test both reliability and specificity when presenting re.sults about facial movements 
(expression production) and emotion inferences (emotion perception). 

• Use formal signal detection analytics and information theory mea.sures rather relying 
on frequency or levels of agreement. 

• Consider using Bayesian methods so that the null hypothesis can be tested directly. 

• Use unlabeled classification approaches to discover emotion categories and their 
expressive forms, rather than continuing to ask whether other cultures express and 
perceive emotions in a manner that is similar to that of people in the United States. 


situations. Future studies should examine emotional 
expression and perception across a range of situations 
that vary systematically in their physical, psychological, 
and social features. Furthermore, scientists should aim 
to understand the various ways that humans acquire 
the skills to express and perceive emotion, as well as 
the conditions that can impair the development of these 
processes. 

The shift toward more context-sensitive scientific 
studies of emotion has already begun (see Box 3 in the 
Supplemental Material), but it currently falls short of 
what we are recommending. Nonscientists (and some 
scientists) still anchor on the common view and only 
slowly shift away from it (Tversky & Kahneman, 1974; 
T. D. Wilson, Houston, Etling, & Brekke, 1996). The 
pervasiveness of the common view supports strong 
convictions about what faces signal, and people often 
continue to hold to those convictions even when they 
are demonstrably wrong (L. F. Barrett, 2017b; Todorov, 
2017). Such convictions reflect cultural beliefs and ste¬ 
reotypes, however. This state of affairs is not unique to 
the science of emotional expression or to the science 
of emotion more generally (Kuhn, 1962). 

In our view, the scientific path forward begins with 
the explicit acknowledgment that we know much less 
about emotional expressions and emotion perception 
than we thought we did, providing an opportunity to 
cultivate the spirit of discovery with renewed vigor and 
take scientific discovery in a new direction (Firestein, 
2012). With this context of discovery comes the 


sobering realization that those of us who cultivate the 
science of emotion and the consumers who use this 
research should seriously question the assumptions of 
the common view and step back from what we think 
we know about reading emotions in faces. Understand¬ 
ing how best to infer someone’s emotional state or 
predict someone’s future actions from their facial move¬ 
ments awaits the outcomes of future research. 

Appendix A: Glossary 

Accuracy/accurate: The extent to which a participant’s 
performance corresponds to what is hypothesized in a 
given experimental task. Critically, this requires that the 
hypothesized performance can be measured in a perceiver- 
independent way that is not subject to the inferences of the 
experimenter. 

Affect: A general property of experience that has at least 
two features: pleasantness or unpleasantness (valence) 
and degree of arousal. Affect is part of every waking 
moment of life and is not specific to instances of emo¬ 
tion, although all emotional experiences have affect at 
their core. 

Agreement: The extent to which two people provide 
consistent responses; high agreement produces high 
intersubject consistency. Percentage agreement is not 
the same as percentage accuracy, because the former is 
more perceiver-dependent than the latter. 

Appraisal: A psychological feature of experience (e.g., a 
situation is experienced as novel). Some scientists use the 
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word appraisal to additionally refer to a literal cognitive 
mechanism that causes a feature of experience (e.g., an 
evaluation or judgment of whether a situation is novel). 
Approach/avoidance: A fundamental dimension of 
motivated behavior. It is different from valence, which is 
a dimension of experience rather than of behavior. 
Category/categorization: The psychological grouping 
of objects, people, or events that are perceived to be simi¬ 
lar in some way. Categorization may occur consciously or 
unconsciously. May be explicit (as when applying a ver¬ 
bal label to instances of the grouping) or implicit (treating 
instances the same way or behaving toward them in the 
same way). 

Choice-from-array tasks: Any judgment task that asks 
research participants to pick a correct answer from a small 
selection of options provided by the experimenter. For 
example, in the study of emotion perception, participants 
are often shown a posed facial configuration depicting an 
emotional expression (e.g., a scowl), along with a small 
selection of emotion words (e.g., “angry,” “sad,” “happy”) 
and asked to pick the word that best describes the face. 
Common view: In this article, the most predominant 
view about how emotions are related to facial move¬ 
ments. Although it is difficult to quantify, we character¬ 
ize the common view through examples (e.g., an Internet 
Google search—see Box 1 in the Supplemental Material). 
The common view holds that (a) certain emotion catego¬ 
ries reliably cause specific patterns of facial muscle move¬ 
ments, and (b) specific configurations of facial muscle 
movements are diagnostic of certain emotions categories. 
See Figure 4. 

Conditional probability: The probability that an event 
X will occur given that another event Y has already 
occurred, or p(.X\ Y). If X is a frown and Y is sadness, 
then jD(frown | sadness) is the conditional probability that 
a person will frown when sad. See also consistency, 
forward inference, and reverse inference. 
Configuration of facial-muscle movements/facial 
configuration: A pattern of visible contractions of mul¬ 
tiple muscles in the face. Configurations can be described 
with FACS coding. Not synonymous with facial expres¬ 
sion, which requires an inference about the causes or 
meaning of the facial configurations. 

Confirmatory bias: The tendency to search for, remem¬ 
ber, or believe evidence that is consistent with one’s 
existing beliefs or hypotheses rather than ramain open to 
evidence inconsistent with one’s priors. 

Congenitally blind: People who are born without 
vision. The use of this term in the literature is consider¬ 
ably heterogeneous. Some people are truly blind from 
the moment they are born, but others have severe visual 
impairments short of complete blindness or they become 
blind in infancy. If the cause is peripheral (in the eyes 
rather than the brain), such individuals may still be able 
to think and imagine very similarly to sighted individuals. 


Consistency: An outcome that does not vary greatly 
across time, context, and different individuals. Consis¬ 
tency is not accuracy (e.g., a group of people can con¬ 
sistently believe something that is wrong). Also referred 

to as reliability. 

Discrimination: In psychophysics, the action of judging 
that two stimuli are different from one another. This is 
separate from pinpointing what they are (identification) 
or what they mean (recognition). 

Emotional episode: A window of time during which 
an emotional instance unfolds. Often, but not always, 
accompanied by an experience of emotion, and some¬ 
times, but not always, involves an emotional expres¬ 
sion. 

Emotional expression: A facial configuration, bodily 
movement, or vocal expression that reliability and specif¬ 
ically communicates an emotional state. Many perceived 
emotional expressions are in fact errors of reverse infer¬ 
ence on the part of perceivers (e.g., an actor crying when 
not sad). 

Emotional granularity: Experiencing or perceiving 
emotions according to many different categories. For 
instance, low emotional granularity involves understand¬ 
ing terms such as angry, sad, and afraid as synonyms of 
unpleasant-, high emotional granularity involves under¬ 
standing terms such 2ls frustrated, irritated, and enraged 
as distinct from each other and from angry. 

Emotional instance/instance of emotion: An event 
categorized as an emotion. For example, an instance of 
anger is the categorization of an emotional episode of 
anger. In cognitive science, an instance is called a token 
and the category is called a type. So, an instance of anger 
is a token of the category anger. (See emotional epi¬ 
sode.) 

Facial action coding system (FACS): A system to 
describe and quantify visible human facial movements. 
Facial expression: A facial configuration that is 
inferred to express an internal state. 

Facial movement: A facial configuration that is objec¬ 
tively described in a perceiver-independent way. This 
description is agnostic about whether the movement 
expresses an emotion and does not use reverse infer¬ 
ence. FACS coding is used to describe facial movements. 
Forward inference: Inferring an effect from knowing its 
cause. An example would be the conditional probabil¬ 
ity of observing a frown if we know somebody is angry, 
/>(frown I anger). 

Free-labeling task: An experimental task that is not 
a forced choice, but in which the participants generate 
words or other responses of their choosing. 
Generalize/generalizability: The replication of research 
findings across different settings, samples, or methods. 
Generalizability can be weak (when a finding can be rep¬ 
licated to a limited extent) or strong (when it can be repli¬ 
cated across very different methods and cultures). 
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Mental inference/mentalizing: Assigning a mental cause 
to actions; also sometimes referred to as theory of mind. 
The reverse inference of inferring emotions from seeing 
facial movements can be an example of mentalizing. 
Meta-analysis: A method for statistically combining 
findings from many studies. 

Multimodal: Combining information from more than 
one of the senses (e.g., vision and audition). 

Null hypothesis: The hypothesis or default position that 
there is no relationship between a set of variables. Equiv¬ 
alent to observing effects that would occur by chance 
(i.e., what would obtain if observations are random or 
permuted). Consequently, if the null hypothesis is true, 
the distribution of p values is uniform (every possible 
outcome has an equal chance). 

Perceiver-dependent: An observation that depends on 
human judgment. Perceiver dependency can produce 
conclusions that are consistent across people but consis¬ 
tency does not assure accuracy or validity. 
Perceiver-independent: An observation that does not 
depend on human judgment (although its interpretation 
will depend on human inference). Some philosophers 
argue that all observations require human judgment, 
there are degrees of dependency. Judging whether a 
flower vase is rectangular or oval is relatively perceiver- 
independent, whereas judging whether it looks nice is 
perceiver-dependent. 

Perceptual matching task: An experimental task that 
requires research participants to judge two stimuli, such 
as two facial configurations, as similar or different. This 
requires only discrimination, not categorization, rec¬ 
ognition, or naming. 

Prototype: The most frequent or most typical instance 
of a category. Distinct from stereotype: A group of peo¬ 
ple may have a perceiver-dependent stereotype that is 
an inaccurate representation of the prototype. 
Recognize/recognition: Acknowledging something’s 
existence (which is confirmed to exist by perceiver- 
independent means). Contrasted with perception (which 
involves inference and interpretation). 
Reliable/reliability: An observation that is repeatable 
across time, context, and individuals. See consistency. 
Replicable: The extent to which new experiments come 
to the same conclusions as a previous study. Strong repli¬ 
cations generalize well: Similar conclusions are obtained 
even when the new experiments use different subject 
samples, stimuli, or contexts. 

Reverse correlation: A psychophysical, data-driven 
technique for deriving a representation of something 
(e.g., an image of a facial configuration) by averaging 
across a large number of judgments. 

Reverse inference: Inferring a cause from having 
observed its purported effect. For instance, inferring that 


a scowl means someone is angry—the conditional prob¬ 
ability, />(anger | frown). In general, reverse inference is 
poorly constrained because multiple causes are usually 
compatible with any observation. 

Sensory modalities: The different senses: vision, hear¬ 
ing, etc. 

Specific/specificity: Research conclusions that include 
positive as well as negative statements. For instance, con¬ 
cluding that a scowl signals anger and no emotion cat¬ 
egories other than anger. High specificity is required for 
valid reverse inference. 

Stereotype: A widely held belief about a category that 
is generally believed to be more applicable than it actu¬ 
ally is. 

Universal: Something that is common or shared by 
all humans. The source of this commonality (innate or 
learned) is a separate issue. If an effect is universal, it 
generalizes across cultures. 

Validity: Whether an observed variable actually mea¬ 
sures what is claimed—for example, whether a facial 
movement reliably expresses an emotion (convergent 
validity) and specifically that emotion (discriminative 
validity)—where the presence of the emotional instance 
can be verified by objective means. 
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Notes 

1. Decades of research in social psychology show that humans 
automatically try to predict other people’s behavior by inferring 
a mental state—this is called mental-state inference or mental- 
izing, such as when inferring someone’s emotional state (e.g., 
for a review, see Gilbert, 1998). This research .suggests that 
inference and prediction are not separate steps (E. R. Smith & 
DeCo.ster, 2000). 

2. Words in boldface type appear in the glossary in Appendix A. 

3 . To be clear, teaching children how to infer emotions in others 
is not a problem because this skill is related to efficient commu¬ 
nication with others. The question is whether children are being 
taught information that is scientifically valid and generalizable. 

4. A website for the Detego Group (2018) indicated that “The 
methods developped [sic] by Paul Ekman are based on 40 years 
of research and are being taught to the FBI, CIA, Scotland Yard 
and more forensics specialists around the world.” 

5. This empirical emphasis is largely consistent with scientists’ 
explicit reports of what they believe, according to a recent sur¬ 
vey (Ekman, 2016). Two-hundred forty-eight scientists who 
published peer-reviewed articles on the topic of emotion were 
asked about their views on what the scientific evidence shows. 
Of the 149 ( 60 %) who responded, 119 (80%) indicated that they 
believed compelling evidence exists for the hypothesis that cer¬ 
tain emotion categories are expressed with universal facial con¬ 
figurations or vocal signals; no questions about variability were 
included in the survey. 

6. In social psychology, this is the distinction between identify¬ 
ing an action and making an inference about the mental cause 
of the action (Gilbert, 1998; Vallacher & Wegner, 1987). 

7. This corresponds to the null hypothesis for the true positive 
(in Fig. 3). 

8. To test the specificity hypothesis, we test something called 
the false positive: that people frequently scowl when not angry, 
meaning that they scowl more frequently than chance would 
allow when fearful, sad, confused, hungry, and so on (see Fig. 
3). Retaining the null hypothesis for the false positive (i.e.. 


that people do not scowl more frequently than they would by 
chance when fearful, sad, confused, hungry, etc.) is equivalent 
to rejecting the null hypothesis for (i.e., finding .support for) 
the specificity hypothesis. Rejecting the null hypothesis for the 
false positive (because people scowl when they are fearful, sad, 
confused, hungry, etc., in addition to when they are angry) is 
evidence of no specificity (i.e., retaining the null hypothesis for 
the test of specificity). 

9 . Our decision to focus on the anger, di.sgust, fear, happiness, 
sadness, and surprise categories was reinforced by two obser¬ 
vations. First, consider a recent poll that asked scientists about 
their beliefs (Ekman, 2016). Two-hundred forty-eight scientists 
who published peer-reviewed articles on the topic of emotion 
were given a list of 18 emotion labels and were asked to indi¬ 
cate which, according to available empirical evidence, have 
been established as biological categories with universal expres¬ 
sions. Of the 149 ( 60 %) who responded. 

There was high agreement about five emotions . . .: 
anger (91%), fear (90%), disgust (86%), sadness (80%), 
and happiness (76%). Shame, surprise, and 
embarrassment were endorsed by 40%-50%. Other 
emotions, currently under study by various investigators 
drew substantially less support: guilt (37%), contempt 
( 34 %), love ( 32 %), awe (31%), pain (28%), envy (28%), 
compassion (20%), pride (9%), and gratitude (6%). 
(Ekman, 2016, p. 32) 

Second, there is no smoking gun in the published research on 
these additional emotion categories—that is, thus far, there are 
no scientific findings related to the production or perception of 
facial expressions for those emotion categories that challenge 
the general conclusions of this article. Simply put, regardless 
of how many emotion categories we evaluated, the pattern of 
findings was the same. 

10. Different numbers of facial muscles are reported in various 
sources depending on how muscles are grouped or divided. 

11. Regarding facial expressions: 

Scientists often refer to a set of actions that occur on the 
face simultaneously as “facial events,” rather than calling 
them facial expressions. It is more descriptive. The word 
“expression” .suggests that something from the inside 
becomes observable on the outside. Yet not every facial 
behavior expresses an internal state - most probably do 
not. (Rosenberg, 2018) 

12. Box 6 in the Supplemental Material presents a .summary 
of computer-vision algorithms for automatically detecting facial 
actions. 

13 . Changes in illumination and face orientation are currently 
major hurdles. 

14. Thirty-eight groups, each with its own face-reading algo¬ 
rithm, announced their intention to participate in the challenge 
(Benitez-Quiroz, Srinivasan, et ah, 2017). Groups tuned their 
algorithms on the set of training images that were provided 2 
weeks before the challenge deadline. Einal evaluations were 
done on the testing set only. Of the original 38 groups, only 4 
.submitted results before the challenge ended. 

15 . These accuracy levels might be considered an upper esti¬ 
mate because of the characteristics of the training and test-image 
databases. The methods for choosing the database are described 
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in Benitez-Quiroz et al. (2016). Note that a number of images 
are posed and professionally taken. Some facial configurations 
are exaggerated. Under these idealized circumstances, manual 
verification of these faces was estimated at 81% accuracy. 

16. It is also possible that an individual person has a repertoire 
of probabilistic physical changes that reliably and specifically 
occur during the instances of a single emotion category; for a 
number of reasons, however, this hypothesis has not yet been 
scientifically tested. Specific studies to address this question 
would be very helpful. 

17. There are ways to get around this circularity by using un.su- 
pervised, data-driven methods to discover categories, but to 
date, studies have used supervised approaches in which cat¬ 
egories are prescribed by human inference. 

18. By relying on their own beliefs, scientists are using human 
consensus to identify when an emotional episode is occurring 
and which emotion category it belongs to (i.e., when they agree 
that fear or some other emotion is present, then it is said to be 
present). It is important to realize that every single experiment 
dealing with emotion to date relies on human inference in this 
way. Consen.sus inferences are made in many areas of science. 
In physics and astronomy, consensus emerges from expert 
scientists whose beliefs and assumptions often challenge the 
common-sense view, such as in the case of quantum mechan¬ 
ics, dark matter, and black holes. In other areas of psychology, 
consensus is used to define many mental categories, such as 
memory and attention; consen.sus is also used to define p.sychi- 
atric categories, such schizophrenia and autism. Even defining 
depression as a mental as opposed to a physical illness is a 
matter of consensus rather than objective ground truth. But it 
is noteworthy that when it comes to emotions, scientists use 
exactly the same categories as nonscientists, which may give us 
cause for concern (as forewarned by William James, 1890/2007, 
1894). For example, compare the findings in Box 8 with the 
recent .survey of scientists who study emotion (Ekman, 2016): 
Of 149 scientists who responded, 88 continue to believe that 
certain emotion categories have universal physiological mark¬ 
ers, despite meta-analyses showing otherwise. 

19. These meta-analytic findings are consistent with an earlier 
summary published by Matsumoto et al. (2008): Of 14 studies 
using rigorous FACS coding by human experts, only 5 reported 
that participants spontaneously displayed some or all of the 
hypothesized AUs during emotions. This is in contrast to the 9 
studies that used the less reliable emotion FACS (EMFACS) cod¬ 
ing, all of which reported support. These findings suggest that 
some type of perceptual bias creeps in when observers make 
judgments of whether an AU is present or not (e.g., indicating 
whether a participant is smiling, or displaying “happiness”) than 
when AUs are coded independently, one at a time. 

20. Remote, small-scale cultures are not untouched by Western 
influences. All cultures have some minimal contact with Western 
cultures (and this was also the case for the seminal articles 
published by Ekman and his colleagues in the 1970.s; Crivelli & 
Gendron, 2017). 

21. The Trobriand Islanders and the Fore are different ethnic 
groups; Trobrianders are .subsistence fisherman and horticul- 
turalists living in a small archipelago of islands located 200 km 


from the mainland where the Fore (who were photographed) 
lived; recall that the Fore photos were judged by Trobrianders 
in Crivelli et al., 2017). As Crivelli et al. make clear in their 
article, these findings are a within-nation rather than a within- 
culture comparison. 

22. The value of this particular study is that the researchers 
not only coded infants’ facial movements but also measured a 
range of concurrent movements that could .support inferences 
about the infants’ feelings of pleasantness, unpleasantness, and 
level of arousal, termed affect (see Box 9 in the Supplemental 
Material), including increased respiration, withdrawal/leaning 
away with the body, stilling/freezing, struggling, turning toward 
the mother, extreme withdrawal, hiding of their faces, squirm¬ 
ing, self-stimulation, looking toward mother, pointing at the 
object, doing a “double-take,” and banging on the table. 

23. Bennett et al. (2002) note that when they observed facial 
actions that were thought to be associated with more than one 
emotion category—for example, when an infant produced a 
facial configuration that was a combination of scowling (anger) 
and pouting (sadness)—they interpreted the expression using 
the facial actions in only the upper region of the face, which 
indicates that infants’ facial movements were even more vari¬ 
able than reported in the data tables. A footnote in the article 
further indicates that infants produced facial movements that 
were interpreted to reflect “interest” across all of the eliciting 
situations, but these facial actions were not included in any 
data analyses (Bennett et al., 2002, footnote 1). Any facial con¬ 
figuration that included AUs stipulated as interest and AUs for 
another emotion category was coded as an expression of the 
other emotion category. 

24. In addition, it is not clear that children find sour foods dis¬ 
gusting (e.g., Rozin, Hammer, Oster, Horowitz, & Marmora, 1986; 
Stein, Ottenberg, & Roulet, 1958). Young children appear to be 
attracted to many things that adults find disgusting, whereas 
by the age of five, children have more adult-like behavioral 
responses and reject them (Rozin et al., 1986). For a discussion 
of how disgust is learned, see Widen and Russell (2013). 

25. In another naturalistic study, videos of children ages 4 
through 7 were downloaded from the Internet and FACS coded 
(Shuster, Camras, Grabell, & Perlman, 2018). The children were 
playing “the scary maze game”: A child solves maze after maze 
of increasing difficulty, only to encounter a screaming, demonic 
girl from the movie The Exorcist (filmed in 1973). The game is 
generally thought to evoke an instance of fear (hence the name 
“scary”), but it may also evoke .surprise as the scary stimulus 
makes a sudden unexpected appearance. Children produced 
the wide-eyed, gasping configuration (the proposed facial 
expression of fear) and/or a startled configuration (the pro¬ 
posed facial expression of .surprise) with only weak reliability 
(38% and 10%, respectively). 

26. By analogy, people who have been blind since birth learn 
color concepts and the relation between these concepts, such 
as “red,” “blue,” and “green” are similar to those of sighted 
people (e.g., congenitally blind individuals understand the 
U.S. concept for “blue” is more similar to “green” than to “red”; 
Shepard & Cooper, 1992). The structure of brain regions in the 
visual cortex that represent visual concepts are also virtually 
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indistinguishable in sighted and congenitally blind individuals 
(Koster-Hale, Bedny, & Saxe, 2014; X. Wang et al., 2015). 

27. The onset and severity of blindness varies hugely across 
studies. Even a small amount of visual experience in infancy 
or early childhood will influence brain development and pro¬ 
vide experiences for learning about emotions (see earlier sec¬ 
tion on emotion-concept development in infants). Helen Keller, 
for example, could see and hear until she was 19 months 
old, providing some initial scaffolding for her later ability to 
communicate. 

28. For example, Ekman (2017) recently wrote. 

Another challenge to the findings of universality came 
from the anthropologist, Margaret Mead . . . Establishing 
that posed expressions are universal, she said, does not 
necessarily mean that spontaneous expressions are uni¬ 
versal. I replied (Ekman, 1977) that it seemed illogical to 
presume that people can readily interpret posed facial 
expressions if they had not seen those facial expressions 
and experienced them in actual social life. (p. 46) 

29 . Although these findings are instructive, they probably pro¬ 
vide a lower limit of the possible real-world variation in the 
facial configurations that express the varied instances of a given 
emotion category. After all, the Internet is a curated version of 
reality, and some frequent facial configurations are likely miss¬ 
ing because they are rarely uploaded to the Internet. Likewise, 
some configurations commonly found on the Internet might not 
be commonly observed in the real world. 

30 . Compare these findings with those from a study that mined 
images from the Internet using a similar but narrower approach, 
and who had two raters use a choice-from-array method to 
label the images (Mollahosseini et al., 2016). 

31 . Configuration 3 also resembles people’s beliefs about the 
configurations that express fear and awe (i.e., the “international 
core patterns” reported by Cordaro et al, 2018). 

32 . More generally, participants are more likely to perceive the 
intended emotion in the hypothesized facial configurations of 
Figure 4 when they are displayed on dynamically moving, syn¬ 
thetic faces (Wehrle, Kaiser, Schmidt, & Scherer, 2000), in video 
footage of posed facial muscle movements (e.g., Ambadar, 
Schooler, & Cohn, 2005; Cunningham & Wallraven, 2009), and 
even in point-light displays of motion created by facial muscle 
movements (Bassili, 1979). This “dynamic advantage” some¬ 
times disappears when participants are viewing real human 
faces (e.g., Fiorentini & Viviani, 2011; Gold et al, 2013; Miles & 
Johnston, 2007; N. L. Nelson & Russell, 2011). 

33 . Ekman and Friesen (1971) was chosen as one of the 40 
studies that changed psychology (Hock, 2009) and, along with 
Ekman et al. (1969), is routinely discussed in introductory psy¬ 
chology textbooks. 

34 . Dioula participants from Burkina Faso in West Africa 
showed strong reliability for labeling smiling facial configura¬ 
tions as happiness, moderate reliability for labeling frowning 
facial configurations as sadness, startled facial configurations 
as surprise, and nose-wrinkled facial configurations as disgust. 
They showed weak reliability for labeling scowling facial con¬ 
figurations as anger and wide-eyed, gasping facial configura¬ 
tions as fear. 

35. For example, a sample of Trobriand Islanders, who are .sub¬ 
sistence horticulturalists and fishermen living in the Trobriand 
Islands of Papua New Guinea, labeled a scowling facial 


configuration as anger with above chance reliability (29% of the 
time), but also labeled that facial configuration more frequently 
with “feels like avoiding a social interaction” (50% of the time; 
Crivelli et al., 2017, Study 2). In fact, the wide-eyed, gasping 
facial configuration that is thought to be the expression for fear 
(Fig. 4) is understood as an expression of aggression or threat 
in the Trobriand culture (Crivelli et al., 2017; Crivelli, Jarillo, 
& Fridlund, 2016; Crivelli, Russell, Jarillo, & Fernandez-Dols, 
2016 ). Trobrianders uniquely labeled smiling facial configura¬ 
tions as happiness across two studies but this finding was not 
replicated in a third sample or in a sample of Mwani partici¬ 
pants who are subsistence fisherman living on Matemo Island 
in Mozambique, Africa. 

36 . The ancestors of the Hadza are thought to have been con¬ 
tinuously practicing a hunting-and-gathering lifestyle for at 
least the past 50,000 years in their current region of East Africa. 
Furthermore, Hadza social structure, mobility, residential pat¬ 
terns, and language have thus far remained largely buffered 
from their interactions with other ethnic groups (Apicella & 
Crittenden, 2016; Crittenden & Marlowe, 2008) which have 
been sustained for at least the past 100 years (Jones, 2016). 

37. It has been suggested that the wide-eyed, gasping stereotype 
for fear evolved for enhanced sensory sampling that supports 
efficient threat detection (Susskind et al., 2008). Likewise, the 
nose-wrinkle stereotype for disgust is thought to have evolved 
to limit expo.sure to noxious stimuli (Chapman & Anderson, 
2013; Chapman, Kim, Susskind, & Andenson, 2009). 

38. Note that adult perceivers may have done less overt look¬ 
ing at the postures, but other evidence with the same stimuli 
.suggest that different body contexts influenced how adult par¬ 
ticipants visually scanned the exact same facial configurations; 
Aviezer et al., 2008). At the other end of the age spectrum, older 
adults are also more influenced by context than are young 
adults when inferring emotional meaning in facial configura¬ 
tions (Ngo & Isaacowitz, 2015). 

39 . Some applications will not be affected by context because 
they are not aiming to use facial movements to infer an indi¬ 
vidual’s underlying emotional state. These initiatives have very 
specific applications in mind. For example, detecting pain in 
patients (Apple), driver drowsiness (Google), creating virtual 
facial expression stickers or animojis from one’s own facial 
poses (Facebook, iPhone X), or Alibaba’s “smile to pay.” 

40. The word “appraisal” has two meanings in the science of 
emotion. Here, appraisals simply to refer to the descriptive 
features of how a situation is experienced (e.g., novelty, goal 
relevance) without any inference about how those experien¬ 
tial features are caused (e.g., Clore & Ortony, 2008, 2013). The 
other meaning of appraisal refers to the mechanisms that cause 
the experiential features as components of emotion (e.g., the 
component process model of emotion, in which appraisals are 
considered evaluative “checks” that the human mind uses in a 
.serial fashion; e.g., Scherer, Mortillaro, & Mehu, 2017). There is 
very little evidence that appraisals are, in fact, causal mecha¬ 
nisms (for a discussion, see Parkinson, 1997). In some studies, 
for example, participants are presented with a written scenario 
that is assumed to automatically trigger a specific sequence of 
appraisal checks (i.e., cognitive evaluation.s), which in turn is 
hypothesized to produce a specific pattern of facial muscle 
movements. Notice that the main causal mechanisms here— 
appraisal checks—are not measured directly but are inferred to 
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have occurred. In other studies, participants are asked to explic¬ 
itly report on the appraisals they experience, on the as.sumption 
that the corresponding “checks” are active. There is abundant 
evidence, however, that although people can explicitly report 
on what they are feeling or doing, they are very inaccurate 
at reporting on the causes of their feelings and actions (i.e., 
they cannot accurately report on psychological mechanisms; 
Nisbett & Wilson, 1977; T. D. Wilson, 2002). Emerging scien¬ 
tific evidence links appraisals, as descriptive features, to facial 
movements, although the evidence to date suggests that these 
relationships are not as reliable or as specific as hypothesized 
(a summary of this research program can be found in Scherer 
et al., 2017). 
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